[
https://issues.apache.org/jira/browse/CALCITE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441935#comment-17441935
]
James Starr commented on CALCITE-4882:
--------------------------------------
[~jnadeau], this is great work. I am seeing large per run variance getter than
the error would suggest, sometimes up to 25% slower. However, generally I am
repeating your results and do not want to debug the harness or my setup to
track it down. When I had previously profiled testJoinFiveWay previously, only
5 to 20% of the time was spent in the volcano planner, compared to most of the
time being spent else where. While not relevant for this discussion, their are
pretty big limitation on detecting smaller linear/near linear changes.
{noformat}
Benchmark Mode Cnt Score Error Units
MetadataBenchmark.janino avgt 10 732.861 ± 43.671 ms/op
MetadataBenchmark.janinoWithCompile avgt 10 744.441 ± 30.201 ms/op
MetadataBenchmark.lambda avgt 10 709.299 ± 36.495 ms/op
MetadataBenchmark.lambdaWithConstruction avgt 10 867.560 ± 657.715 ms/op
{noformat}
I ran the benchmark with the the query below in benchmark. I think further
isolating the planning would be the way to go if this test were to be improved.
I also believe that the algorithm is suppose to exponential, on the number of
joins, but these results show super exponential growth, which suggests that
something else is going on in both implementation, perhaps a change from being
CPU bound to memory bound.
{code:java}
CalciteAssert.that()
.with(CalciteAssert.Config.FOODMART_CLONE)
.query("select \"store\".\"store_country\" as \"c0\",\n"
+ " \"time_by_day\".\"the_year\" as \"c1\",\n"
+ " \"product_class\".\"product_family\" as \"c2\",\n"
+ " count(\"sales_fact_1997\".\"product_id\") as \"m0\",\n"
+ " count(\"sales_fact_1998\".\"product_id\") as \"m0\"\n"
+ "from \"store\" as \"store\",\n"
+ " \"sales_fact_1997\" as \"sales_fact_1997\",\n"
+ " \"sales_fact_1998\" as \"sales_fact_1998\",\n"
+ " \"promotion\" as \"promotion\",\n"
+ " \"time_by_day\" as \"time_by_day\",\n"
+ " \"product_class\" as \"product_class\",\n"
+ " \"product\" as \"product\"\n"
+ "where \"sales_fact_1997\".\"store_id\" =
\"store\".\"store_id\"\n"
+ "and \"store\".\"store_country\" = 'USA'\n"
+ "and \"sales_fact_1997\".\"time_id\" =
\"time_by_day\".\"time_id\"\n"
+ "and \"sales_fact_1998\".\"time_id\" =
\"time_by_day\".\"time_id\"\n"
+ "and \"sales_fact_1997\".\"promotion_id\" =
\"promotion\".\"promotion_id\"\n"
+ "and \"sales_fact_1998\".\"promotion_id\" =
\"promotion\".\"promotion_id\"\n"
+ "and \"time_by_day\".\"the_year\" = 1997\n"
+ "and \"sales_fact_1997\".\"product_id\" =
\"product\".\"product_id\"\n"
+ "and \"product\".\"product_class_id\" =
\"product_class\".\"product_class_id\"\n"
+ "group by \"store\".\"store_country\",\n"
+ " \"time_by_day\".\"the_year\",\n"
+ " \"product_class\".\"product_family\"")
.explainContains(""
+ "EnumerableCalc(expr#0..3=["
+ "{inputs}], c0=[$t0], c1=[$t2], c2=[$t1], m0=[$t3], m00=[$t3])\n"
+ " EnumerableAggregate(group=[{1, 9, 12}], m0=[COUNT()])\n"
+ " EnumerableMergeJoin(condition=[AND(=($3, $11), =($4, $10))],
"
+ "joinType=[inner])\n "
+ "EnumerableSort(sort0=[$3], sort1=[$4], dir0=[ASC], dir1=[ASC])\n"
+ " EnumerableMergeJoin(condition=[=($2, $7)],
joinType=[inner])\n"
+ " EnumerableSort(sort0=[$2], dir0=[ASC])\n"
+ " EnumerableHashJoin(condition=[=($0, $5)],
joinType=[inner])\n"
+ " EnumerableCalc(expr#0..23=[{inputs}],
expr#24=['USA':VARCHAR(30)], "
+ "expr#25=[=($t9, $t24)], store_id=[$t0], store_country=[$t9],
$condition=[$t25])\n"
+ " EnumerableTableScan(table=[[foodmart2,
store]])\n"
+ " EnumerableCalc(expr#0..7=[{inputs}],
proj#0..1=[{exprs}],"
+ " promotion_id=[$t3], store_id=[$t4])\n"
+ " EnumerableTableScan(table=[[foodmart2,
sales_fact_1997]])\n"
+ " EnumerableHashJoin(condition=[=($0, $2)],
joinType=[inner])\n"
+ " EnumerableCalc(expr#0..14=[{inputs}],
proj#0..1=[{exprs}])\n"
+ " EnumerableTableScan(table=[[foodmart2,
product]])\n"
+ " EnumerableCalc(expr#0..4=[{inputs}],
product_class_id=[$t0],"
+ " product_family=[$t4])\n "
+ "EnumerableTableScan(table=[[foodmart2, product_class]])\n"
+ " EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC],
dir1=[ASC])\n"
+ " EnumerableHashJoin(condition=[=($0, $4)],
joinType=[inner])\n"
+ " EnumerableCalc(expr#0..6=[{inputs}],
promotion_id=[$t0])\n"
+ " EnumerableTableScan(table=[[foodmart2,
promotion]])\n"
+ " EnumerableHashJoin(condition=[=($0, $2)],
joinType=[inner])\n"
+ " EnumerableCalc(expr#0..9=[{inputs}],
expr#10=[CAST($t4):INTEGER],"
+ " expr#11=[1997], expr#12=[=($t10, $t11)], time_id=[$t0],
the_year=[$t4],"
+ " $condition=[$t12])\n"
+ " EnumerableTableScan(table=[[foodmart2,
time_by_day]])\n"
+ " EnumerableCalc(expr#0..7=[{inputs}], time_id=[$t1],"
+ " promotion_id=[$t3])\n"
+ " EnumerableTableScan(table=[[foodmart2,
sales_fact_1998]])\n\n")
.returns("");
{code}
Results for 7 way join:
{noformat}
Benchmark Mode Cnt Score Error Units
MetadataBenchmark.janino avgt 10 74516.963 ± 986.897 ms/op
MetadataBenchmark.lambda avgt 10 75800.010 ± 1107.234 ms/op
{noformat}
> Explore a LambdaMetadataFactory alternative to Janino for metadata retrieval
> ----------------------------------------------------------------------------
>
> Key: CALCITE-4882
> URL: https://issues.apache.org/jira/browse/CALCITE-4882
> Project: Calcite
> Issue Type: Improvement
> Reporter: Jacques Nadeau
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h 40m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)