[ 
https://issues.apache.org/jira/browse/CALCITE-4882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441935#comment-17441935
 ] 

James Starr commented on CALCITE-4882:
--------------------------------------

[~jnadeau], this is great work.  I am seeing large per run variance getter than 
the error would suggest, sometimes up to 25% slower.  However, generally I am 
repeating your results and do not want to debug the harness or my setup to 
track it down.  When I had previously profiled testJoinFiveWay previously, only 
5 to 20% of the time was spent in the volcano planner, compared to most of the 
time being spent else where.  While not relevant for this discussion, their are 
pretty big limitation on detecting smaller linear/near linear changes. 

{noformat}
Benchmark                                 Mode  Cnt     Score     Error  Units
MetadataBenchmark.janino                  avgt   10  732.861 ±  43.671  ms/op
MetadataBenchmark.janinoWithCompile       avgt   10  744.441 ±  30.201  ms/op
MetadataBenchmark.lambda                  avgt   10  709.299 ±  36.495  ms/op
MetadataBenchmark.lambdaWithConstruction  avgt   10  867.560 ± 657.715  ms/op
{noformat}

I ran the benchmark with the the query below in benchmark.  I think further 
isolating the planning would be the way to go if this test were to be improved. 
 I also believe that the algorithm is suppose to exponential, on the number of 
joins, but these results show super exponential growth, which suggests that 
something else is going on in both implementation, perhaps a change from being 
CPU bound to memory bound.

{code:java}
CalciteAssert.that()
        .with(CalciteAssert.Config.FOODMART_CLONE)
        .query("select \"store\".\"store_country\" as \"c0\",\n"
            + " \"time_by_day\".\"the_year\" as \"c1\",\n"
            + " \"product_class\".\"product_family\" as \"c2\",\n"
            + " count(\"sales_fact_1997\".\"product_id\") as \"m0\",\n"
            + " count(\"sales_fact_1998\".\"product_id\") as \"m0\"\n"
            + "from \"store\" as \"store\",\n"
            + " \"sales_fact_1997\" as \"sales_fact_1997\",\n"
            + " \"sales_fact_1998\" as \"sales_fact_1998\",\n"
            + " \"promotion\" as \"promotion\",\n"
            + " \"time_by_day\" as \"time_by_day\",\n"
            + " \"product_class\" as \"product_class\",\n"
            + " \"product\" as \"product\"\n"
            + "where \"sales_fact_1997\".\"store_id\" = 
\"store\".\"store_id\"\n"
            + "and \"store\".\"store_country\" = 'USA'\n"
            + "and \"sales_fact_1997\".\"time_id\" = 
\"time_by_day\".\"time_id\"\n"
            + "and \"sales_fact_1998\".\"time_id\" = 
\"time_by_day\".\"time_id\"\n"
            + "and \"sales_fact_1997\".\"promotion_id\" = 
\"promotion\".\"promotion_id\"\n"
            + "and \"sales_fact_1998\".\"promotion_id\" = 
\"promotion\".\"promotion_id\"\n"
            + "and \"time_by_day\".\"the_year\" = 1997\n"
            + "and \"sales_fact_1997\".\"product_id\" = 
\"product\".\"product_id\"\n"
            + "and \"product\".\"product_class_id\" = 
\"product_class\".\"product_class_id\"\n"
            + "group by \"store\".\"store_country\",\n"
            + " \"time_by_day\".\"the_year\",\n"
            + " \"product_class\".\"product_family\"")
        .explainContains(""
            + "EnumerableCalc(expr#0..3=["
            + "{inputs}], c0=[$t0], c1=[$t2], c2=[$t1], m0=[$t3], m00=[$t3])\n"
            + "  EnumerableAggregate(group=[{1, 9, 12}], m0=[COUNT()])\n"
            + "    EnumerableMergeJoin(condition=[AND(=($3, $11), =($4, $10))], 
"
            + "joinType=[inner])\n      "
            + "EnumerableSort(sort0=[$3], sort1=[$4], dir0=[ASC], dir1=[ASC])\n"
            + "        EnumerableMergeJoin(condition=[=($2, $7)], 
joinType=[inner])\n"
            + "          EnumerableSort(sort0=[$2], dir0=[ASC])\n"
            + "            EnumerableHashJoin(condition=[=($0, $5)], 
joinType=[inner])\n"
            + "              EnumerableCalc(expr#0..23=[{inputs}], 
expr#24=['USA':VARCHAR(30)], "
            + "expr#25=[=($t9, $t24)], store_id=[$t0], store_country=[$t9], 
$condition=[$t25])\n"
            + "                EnumerableTableScan(table=[[foodmart2, 
store]])\n"
            + "              EnumerableCalc(expr#0..7=[{inputs}], 
proj#0..1=[{exprs}],"
            + " promotion_id=[$t3], store_id=[$t4])\n"
            + "                EnumerableTableScan(table=[[foodmart2, 
sales_fact_1997]])\n"
            + "          EnumerableHashJoin(condition=[=($0, $2)], 
joinType=[inner])\n"
            + "            EnumerableCalc(expr#0..14=[{inputs}], 
proj#0..1=[{exprs}])\n"
            + "              EnumerableTableScan(table=[[foodmart2, 
product]])\n"
            + "            EnumerableCalc(expr#0..4=[{inputs}], 
product_class_id=[$t0],"
            + " product_family=[$t4])\n              "
            + "EnumerableTableScan(table=[[foodmart2, product_class]])\n"
            + "      EnumerableSort(sort0=[$1], sort1=[$0], dir0=[ASC], 
dir1=[ASC])\n"
            + "        EnumerableHashJoin(condition=[=($0, $4)], 
joinType=[inner])\n"
            + "          EnumerableCalc(expr#0..6=[{inputs}], 
promotion_id=[$t0])\n"
            + "            EnumerableTableScan(table=[[foodmart2, 
promotion]])\n"
            + "          EnumerableHashJoin(condition=[=($0, $2)], 
joinType=[inner])\n"
            + "            EnumerableCalc(expr#0..9=[{inputs}], 
expr#10=[CAST($t4):INTEGER],"
            + " expr#11=[1997], expr#12=[=($t10, $t11)], time_id=[$t0], 
the_year=[$t4],"
            + " $condition=[$t12])\n"
            + "              EnumerableTableScan(table=[[foodmart2, 
time_by_day]])\n"
            + "            EnumerableCalc(expr#0..7=[{inputs}], time_id=[$t1],"
            + " promotion_id=[$t3])\n"
            + "              EnumerableTableScan(table=[[foodmart2, 
sales_fact_1998]])\n\n")
        .returns("");
{code}

Results for 7 way join:
{noformat}
Benchmark                 Mode  Cnt      Score      Error  Units
MetadataBenchmark.janino  avgt   10  74516.963 ±  986.897  ms/op
MetadataBenchmark.lambda  avgt   10  75800.010 ± 1107.234  ms/op
{noformat}

> Explore a LambdaMetadataFactory alternative to Janino for metadata retrieval
> ----------------------------------------------------------------------------
>
>                 Key: CALCITE-4882
>                 URL: https://issues.apache.org/jira/browse/CALCITE-4882
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Jacques Nadeau
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to