[jira] [Commented] (DRILL-786) Implement CROSS JOIN

Igor Guzenko (JIRA) Mon, 01 Oct 2018 10:00:25 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16634323#comment-16634323
 ]


Igor Guzenko commented on DRILL-786:
------------------------------------

I've tried addition of joinContext map to Calcite's Join class and passed it 
through each point where join instance may be copied or recreated: 
JoinToMultiJoinRule.java
LogicalJoin.java
LoptOptimizeJoinRule.java
MultiJoin.java
MutableRels.java
PigRelFactories.java
RelBuilder.java
RelFactories.java
RelStructuredTypeFlattener.java
SqlToRelConverter.java
SubQueryRemoveRule.java 

But even with such verbose changes I wasn't able to overcome problem when both 
implicit and explicit cross joins are present in one query and option  
{color:#59afe1}planner.enable_nljoin_for_scalar_only {color:#333333}is set to 
true{color}{color}. Such query should fail with exception that says:  "This 
query cannot be planned possibly due to either a cartesian join or an 
inequality join", {color:#d04437}but it works{color}...  I suggest to leave 
this case and simply enable NestedLoopJoin when explicit cross join is present 
in original query. Such solution may be implemented more easily and it won't 
require any changes to Calcite. 

> Implement CROSS JOIN
> --------------------
>
>                 Key: DRILL-786
>                 URL: https://issues.apache.org/jira/browse/DRILL-786
>             Project: Apache Drill
>          Issue Type: New Feature
>          Components: Query Planning &amp; Optimization
>            Reporter: Krystal
>            Assignee: Igor Guzenko
>            Priority: Major
>             Fix For: 1.15.0
>
>
> git.commit.id.abbrev=5d7e3d3
> 0: jdbc:drill:schema=dfs> select student.name, student.age, 
> student.studentnum from student cross join voter where student.age = 20 and 
> voter.age = 20;
> Query failed: org.apache.drill.exec.rpc.RpcException: Remote failure while 
> running query.[error_id: "af90e65a-c4d7-4635-a436-bbc1444c8db2"
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
>     DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>       DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
>         DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>           DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
>         DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>           DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Stack trace:
> org.eigenbase.relopt.RelOptPlanner$CannotPlanException: Node 
> [rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]] could not be implemented; 
> planner state:
> Root: rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]
> Original rel:
> AbstractConverter(subset=[rel#318:Subset#28.PHYSICAL.SINGLETON([]).[]], 
> convention=[PHYSICAL], DrillDistributionTraitDef=[SINGLETON([])], sort=[[]]): 
> rowcount = 22500.0, cumulative cost = {inf}, id = 320
>   DrillScreenRel(subset=[rel#317:Subset#28.LOGICAL.ANY([]).[]]): rowcount = 
> 22500.0, cumulative cost = {2250.0 rows, 2250.0 cpu, 0.0 io, 0.0 network}, id 
> = 316
>     DrillProjectRel(subset=[rel#315:Subset#27.LOGICAL.ANY([]).[]], name=[$2], 
> age=[$1], studentnum=[$3]): rowcount = 22500.0, cumulative cost = {22500.0 
> rows, 12.0 cpu, 0.0 io, 0.0 network}, id = 314
>       DrillJoinRel(subset=[rel#313:Subset#26.LOGICAL.ANY([]).[]], 
> condition=[true], joinType=[inner]): rowcount = 22500.0, cumulative cost = 
> {22500.0 rows, 0.0 cpu, 0.0 io, 0.0 network}, id = 312
>         DrillFilterRel(subset=[rel#308:Subset#23.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 307
>           DrillScanRel(subset=[rel#306:Subset#22.LOGICAL.ANY([]).[]], 
> table=[[dfs, student]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 4000.0 cpu, 0.0 io, 0.0 network}, id = 129
>         DrillFilterRel(subset=[rel#311:Subset#25.LOGICAL.ANY([]).[]], 
> condition=[=(CAST($1):INTEGER, 20)]): rowcount = 150.0, cumulative cost = 
> {1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}, id = 310
>           DrillScanRel(subset=[rel#309:Subset#24.LOGICAL.ANY([]).[]], 
> table=[[dfs, voter]]): rowcount = 1000.0, cumulative cost = {1000.0 rows, 
> 2000.0 cpu, 0.0 io, 0.0 network}, id = 140
> Sets:
> Set#22, type: (DrillRecordRow[*, age, name, studentnum])
> rel#306:Subset#22.LOGICAL.ANY([]).[], best=rel#129, 
> importance=0.5904900000000001
> rel#129:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, student]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 
> network}
> rel#333:AbstractConverter.LOGICAL.ANY([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#337:AbstractConverter.LOGICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#332:Subset#22.PHYSICAL.ANY([]).[], best=rel#335, importance=0.531441
> rel#334:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#338:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#336:Subset#22.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], 
> selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], 
> SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> rel#336:Subset#22.PHYSICAL.SINGLETON([]).[], best=rel#335, 
> importance=0.4782969000000001
> rel#339:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#340:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#335:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/student]], 
> selectionRoot=/drill/testdata/p1tests/student, columns=[SchemaPath [`age`], 
> SchemaPath [`name`], SchemaPath [`studentnum`]]]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 4000.0 cpu, 0.0 io, 0.0 network}
> Set#23, type: (DrillRecordRow[*, age, name, studentnum])
> rel#308:Subset#23.LOGICAL.ANY([]).[], best=rel#307, importance=0.6561
> rel#307:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#306:Subset#22.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 
> network}
> rel#343:AbstractConverter.LOGICAL.ANY([]).[](child=rel#342:Subset#23.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#342:Subset#23.PHYSICAL.SINGLETON([]).[], best=rel#341, 
> importance=0.5904900000000001
> rel#344:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#308:Subset#23.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#341:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#332:Subset#22.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 8000.0 cpu, 0.0 io, 0.0 
> network}
> Set#24, type: (DrillRecordRow[*, age])
> rel#309:Subset#24.LOGICAL.ANY([]).[], best=rel#140, 
> importance=0.5904900000000001
> rel#140:DrillScanRel.LOGICAL.ANY([]).[](table=[dfs, voter]), rowcount=1000.0, 
> cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 network}
> rel#330:AbstractConverter.LOGICAL.ANY([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#349:AbstractConverter.LOGICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#329:Subset#24.PHYSICAL.ANY([]).[], best=rel#347, importance=0.531441
> rel#331:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#350:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#348:Subset#24.PHYSICAL.SINGLETON([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], 
> selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 
> network}
> rel#348:Subset#24.PHYSICAL.SINGLETON([]).[], best=rel#347, 
> importance=0.4782969000000001
> rel#351:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#352:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=1000.0, cumulative cost={inf}
> rel#347:ScanPrel.PHYSICAL.SINGLETON([]).[](groupscan=ParquetGroupScan 
> [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/p1tests/voter]], 
> selectionRoot=/drill/testdata/p1tests/voter, columns=[SchemaPath [`age`]]]), 
> rowcount=1000.0, cumulative cost={1000.0 rows, 2000.0 cpu, 0.0 io, 0.0 
> network}
> Set#25, type: (DrillRecordRow[*, age])
> rel#311:Subset#25.LOGICAL.ANY([]).[], best=rel#310, importance=0.6561
> rel#310:DrillFilterRel.LOGICAL.ANY([]).[](child=rel#309:Subset#24.LOGICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 
> network}
> rel#355:AbstractConverter.LOGICAL.ANY([]).[](child=rel#354:Subset#25.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#354:Subset#25.PHYSICAL.SINGLETON([]).[], best=rel#353, 
> importance=0.5904900000000001
> rel#356:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#311:Subset#25.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=150.0, cumulative cost={inf}
> rel#353:FilterPrel.PHYSICAL.SINGLETON([]).[](child=rel#329:Subset#24.PHYSICAL.ANY([]).[],condition==(CAST($1):INTEGER,
>  20)), rowcount=150.0, cumulative cost={2000.0 rows, 6000.0 cpu, 0.0 io, 0.0 
> network}
> Set#26, type: RecordType(ANY *, ANY age, ANY name, ANY studentnum, ANY *0, 
> ANY age0)
> rel#313:Subset#26.LOGICAL.ANY([]).[], best=rel#312, 
> importance=0.7290000000000001
> rel#312:DrillJoinRel.LOGICAL.ANY([]).[](left=rel#308:Subset#23.LOGICAL.ANY([]).[],right=rel#311:Subset#25.LOGICAL.ANY([]).[],condition=true,joinType=inner),
>  rowcount=22500.0, cumulative cost={4001.0 rows, 14001.0 cpu, 0.0 io, 0.0 
> network}
> rel#327:AbstractConverter.LOGICAL.ANY([]).[](child=rel#326:Subset#26.PHYSICAL.ANY([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#326:Subset#26.PHYSICAL.ANY([]).[], best=null, importance=0.6561
> rel#328:AbstractConverter.PHYSICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> Set#27, type: RecordType(ANY name, ANY age, ANY studentnum)
> rel#315:Subset#27.LOGICAL.ANY([]).[], best=rel#314, importance=0.81
> rel#314:DrillProjectRel.LOGICAL.ANY([]).[](child=rel#313:Subset#26.LOGICAL.ANY([]).[],name=$2,age=$1,studentnum=$3),
>  rowcount=22500.0, cumulative cost={26501.0 rows, 14013.0 cpu, 0.0 io, 0.0 
> network}
> rel#322:AbstractConverter.LOGICAL.ANY([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#321:Subset#27.PHYSICAL.SINGLETON([]).[], best=null, 
> importance=0.7290000000000001
> rel#323:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> Set#28, type: RecordType(ANY name, ANY age, ANY studentnum)
> rel#317:Subset#28.LOGICAL.ANY([]).[], best=rel#316, importance=0.9
> rel#316:DrillScreenRel.LOGICAL.ANY([]).[](child=rel#315:Subset#27.LOGICAL.ANY([]).[]),
>  rowcount=22500.0, cumulative cost={28751.0 rows, 16263.0 cpu, 0.0 io, 0.0 
> network}
> rel#319:AbstractConverter.LOGICAL.ANY([]).[](child=rel#318:Subset#28.PHYSICAL.SINGLETON([]).[],convention=LOGICAL,DrillDistributionTraitDef=ANY([]),sort=[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> rel#318:Subset#28.PHYSICAL.SINGLETON([]).[], best=null, importance=1.0
> rel#320:AbstractConverter.PHYSICAL.SINGLETON([]).[](child=rel#317:Subset#28.LOGICAL.ANY([]).[],convention=PHYSICAL,DrillDistributionTraitDef=SINGLETON([]),sort=[]),
>  rowcount=22500.0, cumulative cost={inf}
> rel#324:ScreenPrel.PHYSICAL.SINGLETON([]).[](child=rel#321:Subset#27.PHYSICAL.SINGLETON([]).[]),
>  rowcount=1.7976931348623157E308, cumulative cost={inf}
> org.eigenbase.relopt.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:445)
>  ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.eigenbase.relopt.volcano.RelSubset.buildCheapestPlan(RelSubset.java:287) 
> ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.eigenbase.relopt.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:669)
>  ~[optiq-core-0.7-20140513.013236-5.jar:na]
> net.hydromatic.optiq.prepare.PlannerImpl.transform(PlannerImpl.java:271) 
> ~[optiq-core-0.7-20140513.013236-5.jar:na]
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToPrel(DefaultSqlHandler.java:119)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:89)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:134)
>  
> ~[drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:338) 
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:186) 
> [drill-java-exec-1.0.0-m2-incubating-SNAPSHOT-rebuffed.jar:1.0.0-m2-incubating-SNAPSHOT]
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_45]
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_45]
> java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-786) Implement CROSS JOIN

Reply via email to