An update.
I’ve run the algorithm on TPC-DS Q17. Here’s the output.
EnumerableProjectRel(I_ITEM_ID=[$0], I_ITEM_DESC=[$1], S_STATE=[$2],
STORE_SALES_QUANTITYCOUNT=[$3], STORE_SALES_QUANTITYAVE=[$4],
STORE_SALES_QUANTITYSTDEV=[$5], STORE_SALES_QUANTITYCOV=[/($5, $4)],
AS_STORE_RETURNS_QUANTITYCOUNT=[$6], AS_STORE_RETURNS_QUANTITYAVE=[$7],
AS_STORE_RETURNS_QUANTITYSTDEV=[$8], STORE_RETURNS_QUANTITYCOV=[/($8, $7)],
CATALOG_SALES_QUANTITYCOUNT=[$9], CATALOG_SALES_QUANTITYAVE=[$10],
CATALOG_SALES_QUANTITYSTDEV=[/($11, $10)], CATALOG_SALES_QUANTITYCOV=[/($11,
$10)]): rowcount = 5.434029018852197E26, cumulative cost =
{1.618185849567114E30 rows, 6.412154242245593E28 cpu, 0.0 io}
EnumerableSortRel(sort0=[$0], sort1=[$1], sort2=[$2], dir0=[ASC], dir1=[ASC],
dir2=[ASC]): rowcount = 5.434029018852197E26, cumulative cost =
{1.6176424466652288E30 rows, 5.597049889417763E28 cpu, 0.0 io}
EnumerableProjectRel(I_ITEM_ID=[$0], I_ITEM_DESC=[$1], S_STATE=[$2],
STORE_SALES_QUANTITYCOUNT=[$3], STORE_SALES_QUANTITYAVE=[CAST(/($4,
$5)):JavaType(class java.lang.Integer)],
STORE_SALES_QUANTITYSTDEV=[CAST(POWER(/(-($6, /(*($4, $4), $5)), CASE(=($5, 1),
null, -($5, 1))), 0.5)):JavaType(class java.lang.Integer)],
AS_STORE_RETURNS_QUANTITYCOUNT=[$7], AS_STORE_RETURNS_QUANTITYAVE=[CAST(/($8,
$7)):JavaType(class java.lang.Integer)],
AS_STORE_RETURNS_QUANTITYSTDEV=[CAST(POWER(/(-($9, /(*($8, $8), $7)),
CASE(=($7, 1), null, -($7, 1))), 0.5)):JavaType(class java.lang.Integer)],
CATALOG_SALES_QUANTITYCOUNT=[$10], CATALOG_SALES_QUANTITYAVE=[CAST(/($11,
$10)):JavaType(class java.lang.Integer)], $f11=[CAST(POWER(/(-($12, /(*($11,
$11), $10)), CASE(=($10, 1), null, -($10, 1))), 0.5)):JavaType(class
java.lang.Integer)]): rowcount = 5.434029018852197E26, cumulative cost =
{1.1954863841615548E28 rows, 5.5427095992292415E28 cpu, 0.0 io}
EnumerableAggregateRel(group=[{0, 1, 2}],
STORE_SALES_QUANTITYCOUNT=[COUNT()], agg#1=[SUM($3)], agg#2=[COUNT($3)],
agg#3=[SUM($6)], AS_STORE_RETURNS_QUANTITYCOUNT=[COUNT($4)], agg#5=[SUM($4)],
agg#6=[SUM($7)], CATALOG_SALES_QUANTITYCOUNT=[COUNT($5)], agg#8=[SUM($5)],
agg#9=[SUM($8)]): rowcount = 5.434029018852197E26, cumulative cost =
{1.1411460939730328E28 rows, 4.890626116966977E28 cpu, 0.0 io}
EnumerableProjectRel(I_ITEM_ID=[$58], I_ITEM_DESC=[$61], S_STATE=[$24],
SS_QUANTITY=[$89], SR_RETURN_QUANTITY=[$140], CS_QUANTITY=[$196], $f6=[*($89,
$89)], $f7=[*($140, $140)], $f8=[*($196, $196)]): rowcount =
5.434029018852197E27, cumulative cost = {1.0868058037845108E28 rows,
4.890626116966977E28 cpu, 0.0 io}
EnumerableJoinRel(condition=[AND(=($82, $133), =($81, $132), =($88,
$139))], joinType=[inner]): rowcount = 5.434029018852197E27, cumulative cost =
{5.434029018992911E27 rows, 5065780.0 cpu, 0.0 io}
EnumerableJoinRel(condition=[=($0, $86)], joinType=[inner]):
rowcount = 2.3008402586892598E13, cumulative cost = {4.8588854672852766E13
rows, 3044518.0 cpu, 0.0 io}
EnumerableTableAccessRel(table=[[TPCDS, STORE]]): rowcount =
12.0, cumulative cost = {12.0 rows, 13.0 cpu, 0.0 io}
EnumerableJoinRel(condition=[=($0, $50)], joinType=[inner]):
rowcount = 1.2782445881607E13, cumulative cost = {1.279800620431134E13 rows,
3044505.0 cpu, 0.0 io}
EnumerableFilterRel(condition=[=(CAST($15):VARCHAR(6) CHARACTER
SET \"ISO-8859-1\" COLLATE \"ISO-8859-1$en_US$primary\", '1998Q1')]): rowcount
= 10957.35, cumulative cost = {84006.35 rows, 146099.0 cpu, 0.0 io}
EnumerableTableAccessRel(table=[[TPCDS, DATE_DIM]]): rowcount
= 73049.0, cumulative cost = {73049.0 rows, 73050.0 cpu, 0.0 io}
EnumerableJoinRel(condition=[=($0, $24)], joinType=[inner]):
rowcount = 7.7770908E9, cumulative cost = {7.783045975286664E9 rows, 2898406.0
cpu, 0.0 io}
EnumerableTableAccessRel(table=[[TPCDS, ITEM]]): rowcount =
18000.0, cumulative cost = {18000.0 rows, 18001.0 cpu, 0.0 io}
EnumerableTableAccessRel(table=[[TPCDS, STORE_SALES]]):
rowcount = 2880404.0, cumulative cost = {2880404.0 rows, 2880405.0 cpu, 0.0 io}
EnumerableJoinRel(condition=[AND(=($31, $79), =($30, $91))],
joinType=[inner]): rowcount = 6.9978029381741304E16, cumulative cost =
{6.9978054204658736E16 rows, 2021262.0 cpu, 0.0 io}
EnumerableJoinRel(condition=[=($28, $0)], joinType=[inner]):
rowcount = 7.87597881975E8, cumulative cost = {7.884434222216867E8 rows,
433614.0 cpu, 0.0 io}
EnumerableFilterRel(condition=[OR(=($15, '1998Q1'), =($15,
'1998Q2'), =($15, '1998Q3'))]): rowcount = 18262.25, cumulative cost =
{91311.25 rows, 146099.0 cpu, 0.0 io}
EnumerableTableAccessRel(table=[[TPCDS, DATE_DIM]]): rowcount
= 73049.0, cumulative cost = {73049.0 rows, 73050.0 cpu, 0.0 io}
EnumerableTableAccessRel(table=[[TPCDS, STORE_RETURNS]]):
rowcount = 287514.0, cumulative cost = {287514.0 rows, 287515.0 cpu, 0.0 io}
EnumerableJoinRel(condition=[=($28, $0)], joinType=[inner]):
rowcount = 3.94888649445E9, cumulative cost = {3.9520401026966867E9 rows,
1587648.0 cpu, 0.0 io}
EnumerableFilterRel(condition=[OR(=($15, '1998Q1'), =($15,
'1998Q2'), =($15, '1998Q3'))]): rowcount = 18262.25, cumulative cost =
{91311.25 rows, 146099.0 cpu, 0.0 io}
EnumerableTableAccessRel(table=[[TPCDS, DATE_DIM]]): rowcount
= 73049.0, cumulative cost = {73049.0 rows, 73050.0 cpu, 0.0 io}
EnumerableTableAccessRel(table=[[TPCDS, CATALOG_SALES]]):
rowcount = 1441548.0, cumulative cost = {1441548.0 rows, 1441549.0 cpu, 0.0 io}
The join order is as follows:
(store x (date_dim x (item x store_sales)))
x ((date_dim x store_returns)
x (date_dim x catalog_sales))
It has some parts right, some parts wrong. Note that it chose "date_dim x (item
x store_sales)” when it should have chosen "item x (date_dim x store_sales)”.
The estimated the row-count for date_dim x store_sales was probably way too
high, due to date_dim having a lot of values that don’t correspond to any
sales. (The designers of TPC-DS, in their wisdom, populated the date dimension
with 200 yrs of data.)
Note that item x store_sales is estimated at 7.8e9 rows, but should be 2.8e7
(the cardinality of store_sales, because it’s a many-to-one join).
All things considered, it’s a good start. I’ve checked in the algorithm [1],
disabled by default.
I’ve logged https://issues.apache.org/jira/browse/OPTIQ-357 with some ideas to
make it better.
Julian
[1] http://git-wip-us.apache.org/repos/asf/incubator-optiq/commit/b0aff0df