[jira] [Commented] (DRILL-786) Implement CROSS JOIN

Igor Guzenko (JIRA) Mon, 12 Nov 2018 01:55:18 -0800


    [ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683489#comment-16683489
 ]


Igor Guzenko commented on DRILL-786:
------------------------------------

Documentation note: 

Due to it's nature cross joins can produce extremely large results, and we 
don't recommend to use the feature if you don't know that results won't cause 
out of memory errors. That's why cross joins are disabled by default, to allow 
explicit cross join syntax you'll have to enable it by setting  
_*planner.enable_nljoin_for_scalar_only*_ option to _*false*_. There is also 
another limitation related to usage of aggregation function over cross join 
relation. When input row count for aggregate function is bigger than value of 
_*planner.slice_target*_ option then query can't be planned (because 2 phase 
aggregation can't be created in such case), as a workaround you should set 
*_planner.enable_multiphase_agg_* to _*false*_. This limitation will be active 
until fix of https://issues.apache.org/jira/browse/DRILL-6839. 

> Implement CROSS JOIN
> --------------------
>
>                 Key: DRILL-786
>                 URL: https://issues.apache.org/jira/browse/DRILL-786
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning &amp; Optimization
>    Affects Versions: 1.14.0
>            Reporter: Krystal
>            Assignee: Igor Guzenko
>            Priority: Major
>              Labels: doc-impacting
>             Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
>     DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>       DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
>         DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>           DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
>         DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>           DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
>     DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>       DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
>         DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>           DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
>         DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>           DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.5904900000000001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 
> network}
> rel#333:AbstractConverter.LOGICAL.ANY([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#337:AbstractConverter.LOGICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#332:Subset#22.PHYSICAL.ANY([]).[], best=rel#335, importance=0.531441
> rel#334:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#338:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], 
> selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], 
> SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> rel#336:Subset#22.PHYSICAL.SINGLETON([]).[], best=rel#335, 
> importance=0.4782969000000001
> rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], 
> selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], 
> SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> Set#23, type: (DrillRecordRow[*, age, name, studentnum])
> rel#308:Subset#23.LOGICAL.ANY([]).[], best=rel#307, importance=0.6561
> rel#307:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 
> network}
> rel#343:AbstractConverter.LOGICAL.ANY([]).[](child=rel#342:Subset#23.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#342:Subset#23.PHYSICAL.SINGLETON([]).[], best=rel#341, 
> importance=0.5904900000000001
> rel#344:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#308:Subset#23.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#341:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 
> network}
> Set#24, type: (DrillRecordRow[*, age])
> rel#309:Subset#24.LOGICAL.ANY([]).[], best=rel#140, 
> importance=0.5904900000000001
> rel#140:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, voter]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}
> rel#330:AbstractConverter.LOGICAL.ANY([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#349:AbstractConverter.LOGICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#329:Subset#24.PHYSICAL.ANY([]).[], best=rel#347, importance=0.531441
> rel#331:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#350:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], 
> selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 
> network}
> rel#348:Subset#24.PHYSICAL.SINGLETON([]).[], best=rel#347, 
> importance=0.4782969000000001
> rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], 
> selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 
> network}
> Set#25, type: (DrillRecordRow[*, age])
> rel#311:Subset#25.LOGICAL.ANY([]).[], best=rel#310, importance=0.6561
> rel#310:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 
> network}
> rel#355:AbstractConverter.LOGICAL.ANY([]).[](child=rel#354:Subset#25.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#354:Subset#25.PHYSICAL.SINGLETON([]).[], best=rel#353, 
> importance=0.5904900000000001
> rel#356:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#311:Subset#25.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#353:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 
> network}
> Set#26, type: RecordType(ANY *, ANY age, ANY name, ANY studentnum, ANY *0, 
> ANY age0)
> rel#313:Subset#26.LOGICAL.ANY([]).[], best=rel#312, 
> importance=0.7290000000000001
> rel#312:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#308:Subset#23.LOGICAL.ANY([]).[],right=rel#311:Subset#25.LOGICAL.ANY([]).[],condition=true,joinType=inner),
>  rowcount=22500.0, cumulative cost={4001.0 rows, 14001.0 cpu, 0.0 io, 0.0 
> network}
> rel#327:AbstractConverter.LOGICAL.ANY([]).[](child=rel#326:Subset#26.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#326:Subset#26.PHYSICAL.ANY([]).[], best=null, importance=0.6561
> rel#328:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> Set#27, type: RecordType(ANY name, ANY age, ANY studentnum)
> rel#315:Subset#27.LOGICAL.ANY([]).[], best=rel#314, importance=0.81
> rel#314:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],name=$2,age=$1,studentnum=$3),
>  rowcount=22500.0, cumulative cost={26501.0 rows, 14013.0 cpu, 0.0 io, 0.0 
> network}
> rel#322:AbstractConverter.LOGICAL.ANY([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#321:Subset#27.PHYSICAL.SINGLETON([]).[], best=null, 
> importance=0.7290000000000001
> rel#323:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> Set#28, type: RecordType(ANY name, ANY age, ANY studentnum)
> rel#317:Subset#28.LOGICAL.ANY([]).[], best=rel#316, importance=0.9
> rel#316:DrillScreenRel.LOGICAL.ANY([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[]),
>  rowcount=22500.0, cumulative cost={28751.0 rows, 16263.0 cpu, 0.0 io, 0.0 
> network}
> rel#319:AbstractConverter.LOGICAL.ANY([]).[](child=rel#318:Subset#28.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#318:Subset#28.PHYSICAL.SINGLETON([]).[], best=null, importance=1.0
> rel#320:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#317:Subset#28.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> rel#324:ScreenPrel.PHYSICAL.SINGLETON([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> org.eigenbase.relopt.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:445)
>  ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.eigenbase.relopt.volcano.RelSubset.buildCheapestPlan(RelSubset.java:287) 
> ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:669)
>  ~[optiq-core-0.7-20140513.013236-5.jar:na]
> net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:271) 
> ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:119)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:89)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:134)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:338) 
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) 
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

Reply via email to