Re: Cannot create physical plan with join

Stamatis Zampetakis Sun, 12 Apr 2020 06:20:46 -0700

It would be nice to add more examples in the site around this use-case but
the way it is the tutorial right now it is not that easy. Of course it
would be great to have, if somebody has time to invest on this.


Best,
Stamatis

On Sat, Apr 11, 2020 at 6:25 PM Tim Fox <[email protected]> wrote:

> Hi Roman,
>
> That was it! I wasn't calling convert on the inputs to the join
> recursively.
>
> Many thanks.
>
> On Sat, 11 Apr 2020 at 15:00, Roman Kondakov <[email protected]>
> wrote:
>
> > Hi Tim,
> >
> > it looks like your physical converter rule for a Join node does not
> > convert it's inputs to your custom FLOWDB convention. And because of it
> > the PhysicalJoin is trying to get input rows from the LogicalScan.
> > You have:
> >
> > PhysicalJoin[FLOWDB]
> >   LogicalTableScan[NONE] <- logical rels have infinite cost
> >   LogicalTableScan[NONE] <- logical rels have infinite cost
> >
> > but it should be
> >
> > PhysicalJoin[FLOWDB]
> >   PhysicalTableScan[FLOWDB]
> >   PhysicalTableScan[FLOWDB]
> >
> > In order to achieve it you need to convert both inputs of the
> > PhysicalJoin node to the FLOWDB convention using  RelOptRule#convert()
> > and RelTraitSet#replace(FLOWDBConvention.INSTANCE) methods. You can find
> > examples in any join converter rule, i.e. BindableJoinRule#convert [1]
> >
> >
> > [1]
> >
> >
> https://github.com/apache/calcite/blob/3755eb5871860f1fd5dc51990129784caa8ac0a4/core/src/main/java/org/apache/calcite/interpreter/Bindables.java#L476
> >
> > --
> > Kind Regards
> > Roman Kondakov
> >
> >
> > On 11.04.2020 14:22, Tim Fox wrote:
> > > Hi All,
> > >
> > > I have recently started using Calcite as the query parser/planner for a
> > > side project. I have created a set of RelNodes corresponding to my
> > physical
> > > nodes, and a set of rules. I have created my own convention.
> > >
> > > All works well for queries without a join - my physical nodes are
> > > created fine (aggregates, projections, filters, table scans, all ok).
> > >
> > > When I try and transform to my physical plan where the query contains a
> > > join, I get the following error:
> > >
> > > "There are not enough rules to produce a node with desired properties:
> > > convention=FLOWDB, sort=[]. All the inputs have relevant nodes, however
> > the
> > > cost is still infinite."
> > >
> > > (full error output at bottom of the post)
> > >
> > > I stumbled upon this post when googling this:
> > >
> > > https://issues.apache.org/jira/browse/CALCITE-3255
> > >
> > > I have checked and I am specifying my convention when transforming to
> the
> > > physical plan, and my rules seem to be set up ok.
> > >
> > > There is one comment in the above linked issue that is perhaps relevant
> > >
> > > "You should also supply the metadata in you convention nodes, so that
> our
> > > metadata system can compute the cumulative cost correctly."
> > >
> > > But I don't really understand what this means. Can someone explain to a
> > > newb like me what metadata is required and how I provide it?
> > >
> > > Many thanks,
> > >
> > > full error report:
> > >
> > > _INITIAL: There are not enough rules to produce a node with desired
> > > properties: convention=FLOWDB, sort=[]. All the inputs have relevant
> > nodes,
> > > however the cost is still infinite.
> > >
> > > Root: rel#76:Subset#3.FLOWDB.[]
> > >
> > > Original rel:
> > >
> > > LogicalProject(subset=[rel#76:Subset#3.FLOWDB.[]], sensor_id=[$0],
> > > temp=[$2], name=[$4], country=[$5]): rowcount = 1500.0, cumulative
> cost =
> > > {1500.0 rows, 6000.0 cpu, 0.0 io}, id = 74
> > >
> > >   LogicalJoin(subset=[rel#73:Subset#2.NONE.[]], condition=[=($1, $3)],
> > > joinType=[left]): rowcount = 1500.0, cumulative cost = {1500.0 rows,
> 0.0
> > > cpu, 0.0 io}, id = 72
> > >
> > >     LogicalTableScan(subset=[rel#70:Subset#0.NONE.[]],
> > > table=[[latest_sensor_readings]]): rowcount = 100.0, cumulative cost =
> > > {100.0 rows, 101.0 cpu, 0.0 io}, id = 65
> > >
> > >     LogicalTableScan(subset=[rel#71:Subset#1.NONE.[]],
> > > table=[[current_locations]]): rowcount = 100.0, cumulative cost =
> {100.0
> > > rows, 101.0 cpu, 0.0 io}, id = 66
> > >
> > > Sets:
> > >
> > > Set#0, type: RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE
> > temp)
> > >
> > > rel#70:Subset#0.NONE.[], best=null, importance=0.7290000000000001
> > >
> > > rel#65:LogicalTableScan.NONE.[](table=[latest_sensor_readings]),
> > > rowcount=100.0, cumulative cost={inf}
> > >
> > > rel#84:Subset#0.FLOWDB.[], best=rel#83, importance=0.36450000000000005
> > >
> > > rel#83:PhysicalTableScan.FLOWDB.[](table=[latest_sensor_readings]),
> > > rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> > >
> > > Set#1, type: RecordType(BIGINT location_id, VARCHAR name, VARCHAR
> > country)
> > >
> > > rel#71:Subset#1.NONE.[], best=null, importance=0.7290000000000001
> > >
> > > rel#66:LogicalTableScan.NONE.[](table=[current_locations]),
> > rowcount=100.0,
> > > cumulative cost={inf}
> > >
> > > rel#82:Subset#1.FLOWDB.[], best=rel#81, importance=0.36450000000000005
> > >
> > > rel#81:PhysicalTableScan.FLOWDB.[](table=[current_locations]),
> > > rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> > >
> > > Set#2, type: RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE
> > temp,
> > > BIGINT location_id0, VARCHAR name, VARCHAR country)
> > >
> > > rel#73:Subset#2.NONE.[], best=null, importance=0.81
> > >
> > >
> >
> rel#72:LogicalJoin.NONE.[](left=RelSubset#70,right=RelSubset#71,condition==($1,
> > > $3),joinType=left), rowcount=1500.0, cumulative cost={inf}
> > >
> > > rel#78:Subset#2.FLOWDB.[], best=null, importance=0.9
> > >
> > >
> >
> rel#80:PhysicalJoin.FLOWDB.[](left=RelSubset#70,right=RelSubset#71,condition==($1,
> > > $3),joinType=left), rowcount=1500.0, cumulative cost={inf}
> > >
> > > Set#3, type: RecordType(VARCHAR sensor_id, DOUBLE temp, VARCHAR name,
> > > VARCHAR country)
> > >
> > > rel#75:Subset#3.NONE.[], best=null, importance=0.9
> > >
> > > rel#74:LogicalProject.NONE.[](input=RelSubset#73,inputs=0,exprs=[$2,
> $4,
> > > $5]), rowcount=1500.0, cumulative cost={inf}
> > >
> > > rel#76:Subset#3.FLOWDB.[], best=null, importance=1.0
> > >
> > >
> >
> rel#77:AbstractConverter.FLOWDB.[](input=RelSubset#75,convention=FLOWDB,sort=[]),
> > > rowcount=1500.0, cumulative cost={inf}
> > >
> > > rel#79:PhysicalProject.FLOWDB.[](input=RelSubset#78,inputs=0,exprs=[$2,
> > $4,
> > > $5]), rowcount=1500.0, cumulative cost={inf}
> > >
> > > Graphviz:
> > >
> > > digraph G {
> > >
> > > root [style=filled,label="Root"];
> > >
> > > subgraph cluster0{
> > >
> > > label="Set 0 RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE
> > > temp)";
> > >
> > > rel65
> > >
> >
> [label="rel#65:LogicalTableScan\ntable=[latest_sensor_readings]\nrows=100.0,
> > > cost={inf}",shape=box]
> > >
> > > rel83
> > >
> >
> [label="rel#83:PhysicalTableScan\ntable=[latest_sensor_readings]\nrows=100.0,
> > > cost={100.0 rows, 101.0 cpu, 0.0 io}",color=blue,shape=box]
> > >
> > > subset70 [label="rel#70:Subset#0.NONE.[]"]
> > >
> > > subset84 [label="rel#84:Subset#0.FLOWDB.[]"]
> > >
> > > }
> > >
> > > subgraph cluster1{
> > >
> > > label="Set 1 RecordType(BIGINT location_id, VARCHAR name, VARCHAR
> > country)";
> > >
> > > rel66
> > > [label="rel#66:LogicalTableScan\ntable=[current_locations]\nrows=100.0,
> > > cost={inf}",shape=box]
> > >
> > > rel81
> > >
> [label="rel#81:PhysicalTableScan\ntable=[current_locations]\nrows=100.0,
> > > cost={100.0 rows, 101.0 cpu, 0.0 io}",color=blue,shape=box]
> > >
> > > subset71 [label="rel#71:Subset#1.NONE.[]"]
> > >
> > > subset82 [label="rel#82:Subset#1.FLOWDB.[]"]
> > >
> > > }
> > >
> > > subgraph cluster2{
> > >
> > > label="Set 2 RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE
> > temp,
> > > BIGINT location_id0, VARCHAR name, VARCHAR country)";
> > >
> > > rel72
> > >
> >
> [label="rel#72:LogicalJoin\nleft=RelSubset#70,right=RelSubset#71,condition==($1,
> > > $3),joinType=left\nrows=1500.0, cost={inf}",shape=box]
> > >
> > > rel80
> > >
> >
> [label="rel#80:PhysicalJoin\nleft=RelSubset#70,right=RelSubset#71,condition==($1,
> > > $3),joinType=left\nrows=1500.0, cost={inf}",shape=box]
> > >
> > > subset73 [label="rel#73:Subset#2.NONE.[]"]
> > >
> > > subset78 [label="rel#78:Subset#2.FLOWDB.[]"]
> > >
> > > }
> > >
> > > subgraph cluster3{
> > >
> > > label="Set 3 RecordType(VARCHAR sensor_id, DOUBLE temp, VARCHAR name,
> > > VARCHAR country)";
> > >
> > > rel74
> > [label="rel#74:LogicalProject\ninput=RelSubset#73,inputs=0,exprs=[$2,
> > > $4, $5]\nrows=1500.0, cost={inf}",shape=box]
> > >
> > > rel77
> > >
> >
> [label="rel#77:AbstractConverter\ninput=RelSubset#75,convention=FLOWDB,sort=[]\nrows=1500.0,
> > > cost={inf}",shape=box]
> > >
> > > rel79
> > > [label="rel#79:PhysicalProject\ninput=RelSubset#78,inputs=0,exprs=[$2,
> > $4,
> > > $5]\nrows=1500.0, cost={inf}",shape=box]
> > >
> > > subset75 [label="rel#75:Subset#3.NONE.[]"]
> > >
> > > subset76 [label="rel#76:Subset#3.FLOWDB.[]"]
> > >
> > > }
> > >
> > > root -> subset76;
> > >
> > > subset70 -> rel65;
> > >
> > > subset84 -> rel83[color=blue];
> > >
> > > subset71 -> rel66;
> > >
> > > subset82 -> rel81[color=blue];
> > >
> > > subset73 -> rel72; rel72 -> subset70[label="0"]; rel72 ->
> > > subset71[label="1"];
> > >
> > > subset78 -> rel80; rel80 -> subset70[label="0"]; rel80 ->
> > > subset71[label="1"];
> > >
> > > subset75 -> rel74; rel74 -> subset73;
> > >
> > > subset76 -> rel77; rel77 -> subset75;
> > >
> > > subset76 -> rel79; rel79 -> subset78;
> > >
> >
>

Re: Cannot create physical plan with join

Reply via email to