Stamatis Zampetakis created CALCITE-4212:
--------------------------------------------
Summary: Revisit cost-model to break ties between Enumerable and
Bindable expressions
Key: CALCITE-4212
URL: https://issues.apache.org/jira/browse/CALCITE-4212
Project: Calcite
Issue Type: Improvement
Reporter: Stamatis Zampetakis
Most Enumerable and Bindable expressions use exactly the same cost function to
compute cost. Depending on the query this may lead to different equivalent
(sub) plans with exactly the same cost. This makes the plans dependent on the
order that the rules are applied.
Let's consider for example the following query present in
{{DruidAdapterIT#testProject}}
{code:sql}
select "product_name", 0 as zero
from "foodmart"
order by "product_name";
{code}
At some point during planning the optimizer needs to decide between the
following plans:
+Choice 1+
{noformat}
EnumerableSort(sort0=[$0], dir0=[ASC]), id = 37
EnumerableInterpreter(subset=[rel#23:RelSubset#1.ENUMERABLE.[]]), id = 43
DruidQuery(subset=[rel#26:RelSubset#1.BINDABLE.[]], table=[[foodmart,
foodmart]], intervals=[[1900-01-09T00:00:00.000Z/2992-01-10T00:00:00.000Z]],
projects=[[$3, 0]]), id = 25
{noformat}
+Choice 2+
{noformat}
EnumerableInterpreter, id = 61
BindableSort(subset=[rel#40:RelSubset#1.BINDABLE.[0]], sort0=[$0],
dir0=[ASC]), id = 41
DruidQuery(subset=[rel#26:RelSubset#1.BINDABLE.[]], table=[[foodmart,
foodmart]], intervals=[[1900-01-09T00:00:00.000Z/2992-01-10T00:00:00.000Z]],
projects=[[$3, 0]]), id = 25
{noformat}
Both choices have exactly the same cost since {{BindableSort}} and
{{EnumerableSort}} use the same cost function ({{Sort#computeSelfCost}},
{{RelMdRowCount#getRowCount(Sort, RelMetadataQuery)}}).
The issue can appear with various other expressions such as Project, SetOp,
etc.
Although the example is taken from the Druid adapter the same can happen if
both Bindable and Enumerable conventions are used during planning in other
use-cases.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)