Maryann Xue created CALCITE-938:
-----------------------------------

             Summary: Make Aggregate return more accurate rowCount if groupSet 
is unique keys.
                 Key: CALCITE-938
                 URL: https://issues.apache.org/jira/browse/CALCITE-938
             Project: Calcite
          Issue Type: Improvement
            Reporter: Maryann Xue
            Assignee: Maryann Xue
            Priority: Minor


If columns in "select distinct" are already distinct, there can be two sets of 
equivalent rel before and after AggregateRemoveRule.
{code}
agg
 |                  input
input
10.0                100.0
{code}
Based on the default implementation of rel metadata, the rowCount of the 
"before" rel is only 1/10 of that of the "after" rel, but meanwhile the "after" 
rel is definitely cheaper. So the Volcano planner would most likely either fail 
to pick the cheapest one or have an inconsistent state due to CALCITE-830.

An example (based EnumerableRel cost model):
The plan for
{code}
select empno, d.deptno
from "scott".emp
join (select distinct deptno from "scott".dept) d
using (deptno);
{code}
would be
{code}
EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
  EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
    EnumerableAggregate(group=[$0])
      EnumerableTableScan(table=[[scott, DEPT]])
    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
      EnumerableTableScan(table=[[scott, EMP]])
{code}
, while it should be
{code}
EnumerableCalc(expr#0..2=[{inputs}], EMPNO=[$t1], DEPTNO=[$t0])
  EnumerableJoin(condition=[=($0, $2)], joinType=[inner])
    EnumerableCalc(expr#0..2=[{inputs}], DEPTNO=[$t0])
      EnumerableTableScan(table=[[scott, DEPT]])
    EnumerableCalc(expr#0..7=[{inputs}], EMPNO=[$t0], DEPTNO=[$t7])
      EnumerableTableScan(table=[[scott, EMP]])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to