Re: Cannot create physical plan with join

Roman Kondakov Sat, 11 Apr 2020 07:00:47 -0700

Hi Tim,

it looks like your physical converter rule for a Join node does not
convert it's inputs to your custom FLOWDB convention. And because of it
the PhysicalJoin is trying to get input rows from the LogicalScan.
You have:


PhysicalJoin[FLOWDB]
  LogicalTableScan[NONE] <- logical rels have infinite cost
  LogicalTableScan[NONE] <- logical rels have infinite cost

but it should be

PhysicalJoin[FLOWDB]
  PhysicalTableScan[FLOWDB]
  PhysicalTableScan[FLOWDB]

In order to achieve it you need to convert both inputs of the
PhysicalJoin node to the FLOWDB convention using  RelOptRule#convert()
and RelTraitSet#replace(FLOWDBConvention.INSTANCE) methods. You can find
examples in any join converter rule, i.e. BindableJoinRule#convert [1]


[1]
https://github.com/apache/calcite/blob/3755eb5871860f1fd5dc51990129784caa8ac0a4/core/src/main/java/org/apache/calcite/interpreter/Bindables.java#L476

-- 
Kind Regards
Roman Kondakov


On 11.04.2020 14:22, Tim Fox wrote:
> Hi All,
> 
> I have recently started using Calcite as the query parser/planner for a
> side project. I have created a set of RelNodes corresponding to my physical
> nodes, and a set of rules. I have created my own convention.
> 
> All works well for queries without a join - my physical nodes are
> created fine (aggregates, projections, filters, table scans, all ok).
> 
> When I try and transform to my physical plan where the query contains a
> join, I get the following error:
> 
> "There are not enough rules to produce a node with desired properties:
> convention=FLOWDB, sort=[]. All the inputs have relevant nodes, however the
> cost is still infinite."
> 
> (full error output at bottom of the post)
> 
> I stumbled upon this post when googling this:
> 
> https://issues.apache.org/jira/browse/CALCITE-3255
> 
> I have checked and I am specifying my convention when transforming to the
> physical plan, and my rules seem to be set up ok.
> 
> There is one comment in the above linked issue that is perhaps relevant
> 
> "You should also supply the metadata in you convention nodes, so that our
> metadata system can compute the cumulative cost correctly."
> 
> But I don't really understand what this means. Can someone explain to a
> newb like me what metadata is required and how I provide it?
> 
> Many thanks,
> 
> full error report:
> 
> _INITIAL: There are not enough rules to produce a node with desired
> properties: convention=FLOWDB, sort=[]. All the inputs have relevant nodes,
> however the cost is still infinite.
> 
> Root: rel#76:Subset#3.FLOWDB.[]
> 
> Original rel:
> 
> LogicalProject(subset=[rel#76:Subset#3.FLOWDB.[]], sensor_id=[$0],
> temp=[$2], name=[$4], country=[$5]): rowcount = 1500.0, cumulative cost =
> {1500.0 rows, 6000.0 cpu, 0.0 io}, id = 74
> 
>   LogicalJoin(subset=[rel#73:Subset#2.NONE.[]], condition=[=($1, $3)],
> joinType=[left]): rowcount = 1500.0, cumulative cost = {1500.0 rows, 0.0
> cpu, 0.0 io}, id = 72
> 
>     LogicalTableScan(subset=[rel#70:Subset#0.NONE.[]],
> table=[[latest_sensor_readings]]): rowcount = 100.0, cumulative cost =
> {100.0 rows, 101.0 cpu, 0.0 io}, id = 65
> 
>     LogicalTableScan(subset=[rel#71:Subset#1.NONE.[]],
> table=[[current_locations]]): rowcount = 100.0, cumulative cost = {100.0
> rows, 101.0 cpu, 0.0 io}, id = 66
> 
> Sets:
> 
> Set#0, type: RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE temp)
> 
> rel#70:Subset#0.NONE.[], best=null, importance=0.7290000000000001
> 
> rel#65:LogicalTableScan.NONE.[](table=[latest_sensor_readings]),
> rowcount=100.0, cumulative cost={inf}
> 
> rel#84:Subset#0.FLOWDB.[], best=rel#83, importance=0.36450000000000005
> 
> rel#83:PhysicalTableScan.FLOWDB.[](table=[latest_sensor_readings]),
> rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> 
> Set#1, type: RecordType(BIGINT location_id, VARCHAR name, VARCHAR country)
> 
> rel#71:Subset#1.NONE.[], best=null, importance=0.7290000000000001
> 
> rel#66:LogicalTableScan.NONE.[](table=[current_locations]), rowcount=100.0,
> cumulative cost={inf}
> 
> rel#82:Subset#1.FLOWDB.[], best=rel#81, importance=0.36450000000000005
> 
> rel#81:PhysicalTableScan.FLOWDB.[](table=[current_locations]),
> rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> 
> Set#2, type: RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE temp,
> BIGINT location_id0, VARCHAR name, VARCHAR country)
> 
> rel#73:Subset#2.NONE.[], best=null, importance=0.81
> 
> rel#72:LogicalJoin.NONE.[](left=RelSubset#70,right=RelSubset#71,condition==($1,
> $3),joinType=left), rowcount=1500.0, cumulative cost={inf}
> 
> rel#78:Subset#2.FLOWDB.[], best=null, importance=0.9
> 
> rel#80:PhysicalJoin.FLOWDB.[](left=RelSubset#70,right=RelSubset#71,condition==($1,
> $3),joinType=left), rowcount=1500.0, cumulative cost={inf}
> 
> Set#3, type: RecordType(VARCHAR sensor_id, DOUBLE temp, VARCHAR name,
> VARCHAR country)
> 
> rel#75:Subset#3.NONE.[], best=null, importance=0.9
> 
> rel#74:LogicalProject.NONE.[](input=RelSubset#73,inputs=0,exprs=[$2, $4,
> $5]), rowcount=1500.0, cumulative cost={inf}
> 
> rel#76:Subset#3.FLOWDB.[], best=null, importance=1.0
> 
> rel#77:AbstractConverter.FLOWDB.[](input=RelSubset#75,convention=FLOWDB,sort=[]),
> rowcount=1500.0, cumulative cost={inf}
> 
> rel#79:PhysicalProject.FLOWDB.[](input=RelSubset#78,inputs=0,exprs=[$2, $4,
> $5]), rowcount=1500.0, cumulative cost={inf}
> 
> Graphviz:
> 
> digraph G {
> 
> root [style=filled,label="Root"];
> 
> subgraph cluster0{
> 
> label="Set 0 RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE
> temp)";
> 
> rel65
> [label="rel#65:LogicalTableScan\ntable=[latest_sensor_readings]\nrows=100.0,
> cost={inf}",shape=box]
> 
> rel83
> [label="rel#83:PhysicalTableScan\ntable=[latest_sensor_readings]\nrows=100.0,
> cost={100.0 rows, 101.0 cpu, 0.0 io}",color=blue,shape=box]
> 
> subset70 [label="rel#70:Subset#0.NONE.[]"]
> 
> subset84 [label="rel#84:Subset#0.FLOWDB.[]"]
> 
> }
> 
> subgraph cluster1{
> 
> label="Set 1 RecordType(BIGINT location_id, VARCHAR name, VARCHAR country)";
> 
> rel66
> [label="rel#66:LogicalTableScan\ntable=[current_locations]\nrows=100.0,
> cost={inf}",shape=box]
> 
> rel81
> [label="rel#81:PhysicalTableScan\ntable=[current_locations]\nrows=100.0,
> cost={100.0 rows, 101.0 cpu, 0.0 io}",color=blue,shape=box]
> 
> subset71 [label="rel#71:Subset#1.NONE.[]"]
> 
> subset82 [label="rel#82:Subset#1.FLOWDB.[]"]
> 
> }
> 
> subgraph cluster2{
> 
> label="Set 2 RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE temp,
> BIGINT location_id0, VARCHAR name, VARCHAR country)";
> 
> rel72
> [label="rel#72:LogicalJoin\nleft=RelSubset#70,right=RelSubset#71,condition==($1,
> $3),joinType=left\nrows=1500.0, cost={inf}",shape=box]
> 
> rel80
> [label="rel#80:PhysicalJoin\nleft=RelSubset#70,right=RelSubset#71,condition==($1,
> $3),joinType=left\nrows=1500.0, cost={inf}",shape=box]
> 
> subset73 [label="rel#73:Subset#2.NONE.[]"]
> 
> subset78 [label="rel#78:Subset#2.FLOWDB.[]"]
> 
> }
> 
> subgraph cluster3{
> 
> label="Set 3 RecordType(VARCHAR sensor_id, DOUBLE temp, VARCHAR name,
> VARCHAR country)";
> 
> rel74 [label="rel#74:LogicalProject\ninput=RelSubset#73,inputs=0,exprs=[$2,
> $4, $5]\nrows=1500.0, cost={inf}",shape=box]
> 
> rel77
> [label="rel#77:AbstractConverter\ninput=RelSubset#75,convention=FLOWDB,sort=[]\nrows=1500.0,
> cost={inf}",shape=box]
> 
> rel79
> [label="rel#79:PhysicalProject\ninput=RelSubset#78,inputs=0,exprs=[$2, $4,
> $5]\nrows=1500.0, cost={inf}",shape=box]
> 
> subset75 [label="rel#75:Subset#3.NONE.[]"]
> 
> subset76 [label="rel#76:Subset#3.FLOWDB.[]"]
> 
> }
> 
> root -> subset76;
> 
> subset70 -> rel65;
> 
> subset84 -> rel83[color=blue];
> 
> subset71 -> rel66;
> 
> subset82 -> rel81[color=blue];
> 
> subset73 -> rel72; rel72 -> subset70[label="0"]; rel72 ->
> subset71[label="1"];
> 
> subset78 -> rel80; rel80 -> subset70[label="0"]; rel80 ->
> subset71[label="1"];
> 
> subset75 -> rel74; rel74 -> subset73;
> 
> subset76 -> rel77; rel77 -> subset75;
> 
> subset76 -> rel79; rel79 -> subset78;
>

Re: Cannot create physical plan with join

Reply via email to