[
https://issues.apache.org/jira/browse/CALCITE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17995809#comment-17995809
]
Julian Hyde commented on CALCITE-7083:
--------------------------------------
[~dongsl], Thank you for adding examples. That helps a lot.
For metadata tests, it is usually sufficient to write a simple query on a
simple schema. For example, your case.1 is analogous to the query
{code}
select deptno, sum(sal)
from emp
where deptno = 10
group by deptno
{code}
and evaluating {{getDistinctRowCount(Aggregate, [1], null)}}. The answer should
be something between 0 and 1, because the query returns at most one row.
For case.2, you change the query to
{code}
select deptno, sum(sal)
from emp
where job = 'MANAGER'
group by deptno
{code}
and the answer should be about 3, because the that is the number of distinct
deptno values.
I think that (output) row count should be an upper bound on all
{{getDistinctRowCount}} metadata queries.
> RelMdDistinctRowCount aggregates implementation problems
> --------------------------------------------------------
>
> Key: CALCITE-7083
> URL: https://issues.apache.org/jira/browse/CALCITE-7083
> Project: Calcite
> Issue Type: Bug
> Components: core
> Affects Versions: 1.40.0
> Reporter: Claude Brisson
> Assignee: Silun Dong
> Priority: Major
>
> The default implementation of getDistinctRowCount for aggregates has several
> problems:
> - when determining the pushable predicates, it makes the assumption that the
> aggregate group key is a zero-based range, which is not necessarily the case
> (the indices in the aggregate group key are the child indices, the predicates
> are expressed in terms of the zero-based output range)
> - if there is any aggregated column in the queried group key, then it makes
> no sense to query the distinct values on the aggregate input, the handler
> should return null or (at most) the full cardinal of the aggregate
--
This message was sent by Atlassian Jira
(v8.20.10#820010)