[ 
https://issues.apache.org/jira/browse/CALCITE-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893963#comment-17893963
 ] 

Julian Hyde commented on CALCITE-6640:
--------------------------------------

As an aside. I was surprised to find that 'select a as a0, a as a1, b as b0, b 
as b1, c as c0, c as c1 from (select distinct a, b, c from t)' (3 groups of 2 
columns has 2^3 = 8 keys) had fewer keys than 'select a as a0, a as a1, a as 
a2, b as b0, b as b1, b as b2 from (select distinct a, b from t)' (2 groups of 
3 columns has 3^2 = 9 keys), so I kicked off an exploration of combinatorics 
and optimization.

If there there are X groups of Y columns then the number of keys is X^Y. I had 
assumed that if X*Y is constant (e.g. there are 6 fields arranged into groups 
of size X) then X^Y was maximized when X = 2. In this case, X^Y is maximized 
when X=3.

I found a [fascinating video|https://www.youtube.com/watch?v=zdAJXil-NvA] on 
maximizing X^Y when X+Y=constant, and it leads to the Lambert W function.

> RelMdUniqueKeys grows exponentially when key columns are repeated in 
> projections
> --------------------------------------------------------------------------------
>
>                 Key: CALCITE-6640
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6640
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>
> Consider the following table where empno is a unique key column.
> {code:sql}
> CREATE TABLE emp (
>  empno INT, 
>  ename VARCHAR, 
>  job VARCHAR
>  PRIMARY KEY (empno));
> {code}
> The results of RelMetadataQuery#getUniqueKeys for the following queries are 
> as follows:
> {code:sql}
> SELECT empno FROM emp;
> {0}
> SELECT ename, empno FROM emp;
> {1} 
> SELECT empno, ename, empno FROM emp;
> {0}, {2}, {0, 2}
> SELECT empno, ename, empno, empno FROM emp;
> {0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
> {code}
> When key columns are repeated in the project the result grows exponentially. 
> This makes the unique key computation very expensive when there are many keys 
> or when keys are repeated multiple times. The problem can lead to OOM errors 
> and queries/rules hanging forever while trying to extract the keys.
> Observe, that the results above are not minimal so currently we are creating 
> and returning a lot of redundant information.
> {noformat}
> {0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
> {noformat}
> If we know that \{0\}, \{2\}, and \{3\} are unique keys individually then any 
> superset of those is also a unique key so it is sufficient to return just 
> those.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to