[jira] [Commented] (CALCITE-7083) RelMdDistinctRowCount aggregates implementation problems

Julian Hyde (Jira) Fri, 04 Jul 2025 14:29:06 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17995809#comment-17995809
 ]


Julian Hyde commented on CALCITE-7083:
--------------------------------------

[~dongsl], Thank you for adding examples. That helps a lot.

For metadata tests, it is usually sufficient to write a simple query on a 
simple schema. For example, your case.1 is analogous to the query
{code}
select deptno, sum(sal)
from emp
where deptno = 10
group by deptno
{code}
and evaluating {{getDistinctRowCount(Aggregate, [1], null)}}. The answer should 
be something between 0 and 1, because the query returns at most one row.

For case.2, you change the query to
{code}
select deptno, sum(sal)
from emp
where job = 'MANAGER'
group by deptno
{code}
and the answer should be about 3, because the that is the number of distinct 
deptno values.

I think that (output) row count should be an upper bound on all 
{{getDistinctRowCount}} metadata queries.

> RelMdDistinctRowCount aggregates implementation problems
> --------------------------------------------------------
>
>                 Key: CALCITE-7083
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7083
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.40.0
>            Reporter: Claude Brisson
>            Assignee: Silun Dong
>            Priority: Major
>
> The default implementation of getDistinctRowCount for aggregates has several 
> problems:
> - when determining the pushable predicates, it makes the assumption that the 
> aggregate group key is a zero-based range, which is not necessarily the case 
> (the indices in the aggregate group key are the child indices, the predicates 
> are expressed in terms of the zero-based output range)
> - if there is any aggregated column in the queried group key, then it makes 
> no sense to query the distinct values on the aggregate input, the handler 
> should return null or (at most) the full cardinal of the aggregate



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-7083) RelMdDistinctRowCount aggregates implementation problems

Reply via email to