Hello,
I have tried to create my own operators with their respective cost model.
Simple cases work, but when I have certain aggregation examples something
goes wrong.
I have this complex query:
*with R as (*
*select L.rowtime , L.productid , L.orderid , L.customerid*
*from*
*(select rowtime, customerid, avg(units) as globalAvgLoad*
*from s.orders*
*group by rowtime, customerid, hop(rowtime, interval '1' second, interval
'1' hour) ) as G,*
*(select rowtime, productid, orderid, customerid, avg(units) as
localAvgLoad*
*from s.orders*
*group by rowtime, productid, orderid, customerid, hop(rowtime, interval
'1' second, interval '1' hour) )as L*
*where L.customerid = G.customerid and L.localAvgLoad > G.globalAvgLoad*
*) *
*select rowtime, customerid, count(*) *
*from R*
*group by rowtime, customerid;*
When I *use* the rule *AggregateReduceFunctionsRule* along with other
rules, Volcano Planner returns to me the expected plan for execution, which
is
SaberAggregateRel(group=[{0, 3}], EXPR$2=[COUNT()]): rowcount = 1875.0,
cumulative cost = {22859.375 rows, 12337.5 cpu, 0.0 io, 84.375 rate, 3.0
memory, 1.0 window, 146.22222222222223 R}, id = 266
SaberJoinRel(condition=[AND(=($3, $6), >($4, $7))], joinType=[inner]):
rowcount = 18750.0, cumulative cost = {20750.0 rows, 12000.0 cpu, 0.0 io,
75.0 rate, 2.0 memory, 1.0 window, 160.0 R}, id = 264
SaberAggrCalcRel: rowcount = 500.0, cumulative cost = {1000.0 rows,
2000.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0 window, 4.0 R}, id = 270
SaberTableScanRel(table=[[s, orders]]): rowcount = 500.0, cumulative
cost = {500.0 rows, 0.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0 window,
0.0 R}, id = 138
SaberAggrCalcRel: rowcount = 500.0, cumulative cost = {1000.0 rows,
2000.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0 window, 4.0 R}, id = 268
SaberTableScanRel(table=[[s, orders]]): rowcount = 500.0, cumulative
cost = {500.0 rows, 0.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0 window,
0.0 R}, id = 138
When I *omit* *AggregateReduceFunctionsRule*, I get:
Exception in thread "main"
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: Node
[rel#98:Subset#9.LOGICAL.[]] could not be implemented; planner state:
Root: rel#98:Subset#9.LOGICAL.[]
Original rel:
Sets:
Set#0, type: RecordType(TIMESTAMP(0) rowtime, INTEGER orderid, INTEGER
productid, INTEGER units, INTEGER customerid)
rel#79:Subset#0.NONE.[0], best=rel#14, importance=0.4782969000000001
rel#14:LogicalTableScan.NONE.[[0]](table=[s, orders]), rowcount=500.0,
cumulative cost={huge}
rel#115:Subset#0.LOGICAL.[0], best=rel#118, importance=0.531441
rel#118:SaberTableScanRel.LOGICAL.[[0]](table=[s, orders]), rowcount=500.0,
cumulative cost={500.0 rows, 0.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0
window, 0.0 R}
Set#1, type: RecordType(TIMESTAMP(0) rowtime, INTEGER customerid,
TIMESTAMP(0) $f2, INTEGER units)
rel#81:Subset#1.NONE.[], best=rel#80, importance=0.531441
rel#80:LogicalProject.NONE.[[0],
[2]](input=rel#79:Subset#0.NONE.[0],rowtime=$0,customerid=$4,$f2=HOP($0,
1000, 3600000),units=$3), rowcount=500.0, cumulative cost={huge}
rel#111:Subset#1.LOGICAL.[], best=rel#116, importance=0.5904900000000001
rel#116:SaberProjectRel.LOGICAL.[[0],
[2]](input=rel#115:Subset#0.LOGICAL.[0],rowtime=$0,customerid=$4,$f2=HOP($0,
1000, 3600000),units=$3), rowcount=500.0, cumulative cost={1000.0 rows,
2000.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0 window, 4.0 R}
Set#2, type: RecordType(TIMESTAMP(0) rowtime, INTEGER customerid,
TIMESTAMP(0) $f2, INTEGER globalAvgLoad)
rel#83:Subset#2.NONE.[], best=rel#82, importance=0.5904900000000001
rel#82:LogicalAggregate.NONE.[](input=rel#81:Subset#1.NONE.[],group={0, 1,
2},globalAvgLoad=AVG($3)), rowcount=50.0, cumulative cost={huge}
rel#107:Subset#2.LOGICAL.[], best=rel#112, importance=0.6561
rel#112:SaberAggregateRel.LOGICAL.[](input=rel#111:Subset#1.LOGICAL.[],group={0,
1, 2},globalAvgLoad=AVG($3)), rowcount=50.0, cumulative cost={1056.25 rows,
4250.0 cpu, 0.0 io, 562.5 rate, 1.0 memory, 1.0 window, 7.555555555555555 R}
Set#3, type: RecordType(TIMESTAMP(0) rowtime, INTEGER customerid, INTEGER
globalAvgLoad)
rel#85:Subset#3.NONE.[], best=rel#84, importance=0.6561
rel#84:LogicalProject.NONE.[](input=rel#83:Subset#2.NONE.[],rowtime=$0,customerid=$1,globalAvgLoad=$3),
rowcount=50.0, cumulative cost={huge}
rel#104:Subset#3.LOGICAL.[], best=rel#108, importance=0.7290000000000001
rel#108:SaberProjectRel.LOGICAL.[](input=rel#107:Subset#2.LOGICAL.[],rowtime=$0,customerid=$1,globalAvgLoad=$3),
rowcount=50.0, cumulative cost={1106.25 rows, 6500.0 cpu, 0.0 io, 562.5
rate, 1.0 memory, 1.0 window, 11.555555555555555 R}
Set#4, type: RecordType(TIMESTAMP(0) rowtime, INTEGER productid, INTEGER
orderid, INTEGER customerid, TIMESTAMP(0) $f4, INTEGER units)
rel#87:Subset#4.NONE.[], best=rel#86, importance=0.531441
rel#86:LogicalProject.NONE.[[0],
[4]](input=rel#79:Subset#0.NONE.[0],rowtime=$0,productid=$2,orderid=$1,customerid=$4,$f4=HOP($0,
1000, 3600000),units=$3), rowcount=500.0, cumulative cost={huge}
rel#113:Subset#4.LOGICAL.[], best=rel#117, importance=0.5904900000000001
rel#117:SaberProjectRel.LOGICAL.[[0],
[4]](input=rel#115:Subset#0.LOGICAL.[0],rowtime=$0,productid=$2,orderid=$1,customerid=$4,$f4=HOP($0,
1000, 3600000),units=$3), rowcount=500.0, cumulative cost={1000.0 rows,
2000.0 cpu, 0.0 io, 500.0 rate, 0.0 memory, 1.0 window, 4.0 R}
Set#5, type: RecordType(TIMESTAMP(0) rowtime, INTEGER productid, INTEGER
orderid, INTEGER customerid, TIMESTAMP(0) $f4, INTEGER localAvgLoad)
rel#89:Subset#5.NONE.[], best=rel#88, importance=0.5904900000000001
rel#88:LogicalAggregate.NONE.[](input=rel#87:Subset#4.NONE.[],group={0, 1,
2, 3, 4},localAvgLoad=AVG($5)), rowcount=50.0, cumulative cost={huge}
rel#109:Subset#5.LOGICAL.[], best=rel#114, importance=0.6561
rel#114:SaberAggregateRel.LOGICAL.[](input=rel#113:Subset#4.LOGICAL.[],group={0,
1, 2, 3, 4},localAvgLoad=AVG($5)), rowcount=50.0, cumulative cost={1056.25
rows, 4250.0 cpu, 0.0 io, 562.5 rate, 1.0 memory, 1.0 window,
7.555555555555555 R}
Set#6, type: RecordType(TIMESTAMP(0) rowtime, INTEGER productid, INTEGER
orderid, INTEGER customerid, INTEGER localAvgLoad)
rel#91:Subset#6.NONE.[], best=rel#90, importance=0.6561
rel#90:LogicalProject.NONE.[](input=rel#89:Subset#5.NONE.[],rowtime=$0,productid=$1,orderid=$2,customerid=$3,localAvgLoad=$5),
rowcount=50.0, cumulative cost={huge}
rel#105:Subset#6.LOGICAL.[], best=rel#110, importance=0.7290000000000001
rel#110:SaberProjectRel.LOGICAL.[](input=rel#109:Subset#5.LOGICAL.[],rowtime=$0,productid=$1,orderid=$2,customerid=$3,localAvgLoad=$5),
rowcount=50.0, cumulative cost={1106.25 rows, 6500.0 cpu, 0.0 io, 562.5
rate, 1.0 memory, 1.0 window, 11.555555555555555 R}
Set#7, type: RecordType(TIMESTAMP(0) rowtime, INTEGER customerid, INTEGER
globalAvgLoad, TIMESTAMP(0) rowtime0, INTEGER productid, INTEGER orderid,
INTEGER customerid0, INTEGER localAvgLoad)
rel#93:Subset#7.NONE.[], best=rel#92, importance=0.7290000000000001
rel#92:LogicalJoin.NONE.[](left=rel#85:Subset#3.NONE.[],right=rel#91:Subset#6.NONE.[],condition=AND(=($6,
$1), >($7, $2)),joinType=inner), rowcount=187.5, cumulative cost={huge}
rel#102:Subset#7.LOGICAL.[], best=rel#106, importance=0.81
rel#106:SaberJoinRel.LOGICAL.[](left=rel#104:Subset#3.LOGICAL.[],right=rel#105:Subset#6.LOGICAL.[],condition=AND(=($6,
$1), >($7, $2)),joinType=inner), rowcount=187.5, cumulative cost={2400.0
rows, 22000.0 cpu, 0.0 io, 84.375 rate, 4.0 memory, 1.0 window,
260.74074074074076 R}
Set#8, type: RecordType(TIMESTAMP(0) rowtime, INTEGER customerid)
rel#95:Subset#8.NONE.[], best=rel#94, importance=0.81
rel#94:LogicalProject.NONE.[](input=rel#93:Subset#7.NONE.[],rowtime=$3,customerid=$6),
rowcount=187.5, cumulative cost={huge}
rel#100:Subset#8.LOGICAL.[], best=rel#103, importance=0.9
rel#103:SaberProjectRel.LOGICAL.[](input=rel#102:Subset#7.LOGICAL.[],rowtime=$3,customerid=$6),
rowcount=187.5, cumulative cost={2587.5 rows, Infinity cpu, 0.0 io,
1.7976931348623157E308 rate, 1.7976931348623157E308 memory,
1.3482698511467366E307 window, 1.7976931348623157E308 R}
Set#9, type: RecordType(TIMESTAMP(0) rowtime, INTEGER customerid, BIGINT
EXPR$2)
rel#97:Subset#9.NONE.[], best=rel#96, importance=0.9
rel#96:LogicalAggregate.NONE.[](input=rel#95:Subset#8.NONE.[],group={0,
1},EXPR$2=COUNT()), rowcount=18.75, cumulative cost={huge}
rel#98:Subset#9.LOGICAL.[], best=null, importance=1.0
rel#99:AbstractConverter.LOGICAL.[](input=rel#97:Subset#9.NONE.[],convention=LOGICAL,sort=[]),
rowcount=18.75, cumulative cost={Infinity rows, Infinity cpu, Infinity io,
1.7976931348623157E308 rate, Infinity memory, 1.7976931348623157E308
window, 1.7976931348623157E308 R}
rel#101:SaberAggregateRel.LOGICAL.[](input=rel#100:Subset#8.LOGICAL.[],group={0,
1},EXPR$2=COUNT()), rowcount=18.75, cumulative cost={2608.59375 rows,
Infinity cpu, 0.0 io, 1.7976931348623157E308 rate, Infinity memory,
1.3482698511467366E307 window, 1.7976931348623157E308 R}
at
org.apache.calcite.plan.volcano.RelSubset$CheapestPlanReplacer.visit(RelSubset.java:443)
at
org.apache.calcite.plan.volcano.RelSubset.buildCheapestPlan(RelSubset.java:293)
at
org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:666)
at org.apache.calcite.tools.Programs$RuleSetProgram.run(Programs.java:368)
at org.apache.calcite.prepare.PlannerImpl.transform(PlannerImpl.java:313)
at calcite.planner.SaberPlanner.getLogicalPlan(SaberPlanner.java:215)
at calcite.Tester.main(Tester.java:234)
Sorry for the long email...
Thank you!