[jira] [Commented] (DRILL-786) Implement CROSS JOIN

Hanumath Rao Maduri (JIRA) Tue, 02 Oct 2018 13:28:08 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16636073#comment-16636073
 ]


Hanumath Rao Maduri commented on DRILL-786:
-------------------------------------------

IMO, the option 3 is what the short term solution for this problem was. i.e 
Treat the explicit CROSS JOIN and implicit cross join same. Planner should 
generate the plan when the flag is enabled (which is true by default) for 
scalar query cases. Otherwise it should throw an error.

I am fine with the option 3 but I am not sure if changing the default value is 
needed.

> Implement CROSS JOIN
> --------------------
>
>                 Key: DRILL-786
>                 URL: https://issues.apache.org/jira/browse/DRILL-786
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning &amp; Optimization
>            Reporter: Krystal
>            Assignee: Igor Guzenko
>            Priority: Major
>             Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
>     DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>       DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
>         DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>           DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
>         DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>           DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
>     DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>       DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
>         DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>           DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
>         DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>           DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.5904900000000001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 
> network}
> rel#333:AbstractConverter.LOGICAL.ANY([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#337:AbstractConverter.LOGICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#332:Subset#22.PHYSICAL.ANY([]).[], best=rel#335, importance=0.531441
> rel#334:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#338:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], 
> selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], 
> SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> rel#336:Subset#22.PHYSICAL.SINGLETON([]).[], best=rel#335, 
> importance=0.4782969000000001
> rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], 
> selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], 
> SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> Set#23, type: (DrillRecordRow[*, age, name, studentnum])
> rel#308:Subset#23.LOGICAL.ANY([]).[], best=rel#307, importance=0.6561
> rel#307:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 
> network}
> rel#343:AbstractConverter.LOGICAL.ANY([]).[](child=rel#342:Subset#23.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#342:Subset#23.PHYSICAL.SINGLETON([]).[], best=rel#341, 
> importance=0.5904900000000001
> rel#344:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#308:Subset#23.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#341:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 
> network}
> Set#24, type: (DrillRecordRow[*, age])
> rel#309:Subset#24.LOGICAL.ANY([]).[], best=rel#140, 
> importance=0.5904900000000001
> rel#140:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, voter]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}
> rel#330:AbstractConverter.LOGICAL.ANY([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#349:AbstractConverter.LOGICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#329:Subset#24.PHYSICAL.ANY([]).[], best=rel#347, importance=0.531441
> rel#331:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#350:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], 
> selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 
> network}
> rel#348:Subset#24.PHYSICAL.SINGLETON([]).[], best=rel#347, 
> importance=0.4782969000000001
> rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], 
> selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 
> network}
> Set#25, type: (DrillRecordRow[*, age])
> rel#311:Subset#25.LOGICAL.ANY([]).[], best=rel#310, importance=0.6561
> rel#310:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 
> network}
> rel#355:AbstractConverter.LOGICAL.ANY([]).[](child=rel#354:Subset#25.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#354:Subset#25.PHYSICAL.SINGLETON([]).[], best=rel#353, 
> importance=0.5904900000000001
> rel#356:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#311:Subset#25.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#353:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 
> network}
> Set#26, type: RecordType(ANY *, ANY age, ANY name, ANY studentnum, ANY *0, 
> ANY age0)
> rel#313:Subset#26.LOGICAL.ANY([]).[], best=rel#312, 
> importance=0.7290000000000001
> rel#312:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#308:Subset#23.LOGICAL.ANY([]).[],right=rel#311:Subset#25.LOGICAL.ANY([]).[],condition=true,joinType=inner),
>  rowcount=22500.0, cumulative cost={4001.0 rows, 14001.0 cpu, 0.0 io, 0.0 
> network}
> rel#327:AbstractConverter.LOGICAL.ANY([]).[](child=rel#326:Subset#26.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#326:Subset#26.PHYSICAL.ANY([]).[], best=null, importance=0.6561
> rel#328:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> Set#27, type: RecordType(ANY name, ANY age, ANY studentnum)
> rel#315:Subset#27.LOGICAL.ANY([]).[], best=rel#314, importance=0.81
> rel#314:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],name=$2,age=$1,studentnum=$3),
>  rowcount=22500.0, cumulative cost={26501.0 rows, 14013.0 cpu, 0.0 io, 0.0 
> network}
> rel#322:AbstractConverter.LOGICAL.ANY([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#321:Subset#27.PHYSICAL.SINGLETON([]).[], best=null, 
> importance=0.7290000000000001
> rel#323:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> Set#28, type: RecordType(ANY name, ANY age, ANY studentnum)
> rel#317:Subset#28.LOGICAL.ANY([]).[], best=rel#316, importance=0.9
> rel#316:DrillScreenRel.LOGICAL.ANY([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[]),
>  rowcount=22500.0, cumulative cost={28751.0 rows, 16263.0 cpu, 0.0 io, 0.0 
> network}
> rel#319:AbstractConverter.LOGICAL.ANY([]).[](child=rel#318:Subset#28.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#318:Subset#28.PHYSICAL.SINGLETON([]).[], best=null, importance=1.0
> rel#320:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#317:Subset#28.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> rel#324:ScreenPrel.PHYSICAL.SINGLETON([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> org.eigenbase.relopt.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:445)
>  ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.eigenbase.relopt.volcano.RelSubset.buildCheapestPlan(RelSubset.java:287) 
> ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:669)
>  ~[optiq-core-0.7-20140513.013236-5.jar:na]
> net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:271) 
> ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:119)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:89)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:134)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:338) 
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) 
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

Reply via email to