[ 
https://issues.apache.org/jira/browse/CALCITE-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892306#comment-17892306
 ] 

Julian Hyde commented on CALCITE-6640:
--------------------------------------

I have worried about the large number of keys in the past. I don't recall which 
Jira case (maybe you can find it), but I think it was related to the {{Values}} 
relational operator. If a {{Values}} has 0 or 1 row, then every combination of 
columns is a unique key.

I thought that I had solved it by making keys minimal. If [a, c] is a unique 
key then so is [a, c, d] and [a, b, c]. So there is no need to derive the fact 
that [a, c, d] is unique, or to store it. (If someone calls 
{{areColumnsUnique(r, [a, c, d])}} of course it should say yes.) Can you solve 
the current case by making keys minimal?

There is a different scenario, relating to duplicate columns, where a quadratic 
(not exponential) number of keys arises and I can't think what to do with it. 
Consider the query
{code:java}
select a as a0, a as a1, a as a2, a as a3, a as a4,
  b as b0, b as b1, b as b2, b as b3, b as b4
from (select distinct a, b from t)
{code}
Any one of the 25 combinations of {{a[i]}} and {{b[j]}} columns is unique, but 
none of them contains a simpler key.

> RelMdUniqueKeys grows exponentially when key columns are repeated in 
> projections
> --------------------------------------------------------------------------------
>
>                 Key: CALCITE-6640
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6640
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>
> Consider the following table where empno is a unique key column.
> {code:sql}
> CREATE TABLE emp (
>  empno INT, 
>  ename VARCHAR, 
>  job VARCHAR
>  PRIMARY KEY (empno));
> {code}
> The results of RelMetadataQuery#getUniqueKeys for the following queries are 
> as follows:
> {code:sql}
> SELECT empno FROM emp;
> {0}
> SELECT ename, empno FROM emp;
> {1} 
> SELECT empno, ename, empno FROM emp;
> {0}, {2}, {0, 2}
> SELECT empno, ename, empno, empno FROM emp;
> {0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
> {code}
> When key columns are repeated in the project the result grows exponentially. 
> This makes the unique key computation very expensive when there are many keys 
> or when keys are repeated multiple times. The problem can lead to OOM errors 
> and queries/rules hanging forever while trying to extract the keys.
> Observe, that the results above are not minimal so currently we are creating 
> and returning a lot of redundant information.
> {noformat}
> {0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
> {noformat}
> If we know that \{0\}, \{2\}, and \{3\} are unique keys individually then any 
> superset of those is also a unique key so it is sufficient to return just 
> those.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to