[
https://issues.apache.org/jira/browse/CALCITE-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893545#comment-17893545
]
Stamatis Zampetakis commented on CALCITE-6640:
----------------------------------------------
OK, I agree let's focus on the minimal keys issue first. It seems that the only
open question is if we want to have the RelMdUniqueKeys metadata handler
enforce the minimality or not. Since the minimality can be easily
checked/verified by inspecting the result of the metadata handler I find it
more appropriate to ensure the correctness of the code/contract via regular
unit tests and not assertions. The metadata handler code is rather
self-contained so I don't think we need to rely on the assertion mechanism.
> RelMdUniqueKeys grows exponentially when key columns are repeated in
> projections
> --------------------------------------------------------------------------------
>
> Key: CALCITE-6640
> URL: https://issues.apache.org/jira/browse/CALCITE-6640
> Project: Calcite
> Issue Type: Bug
> Components: core
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: pull-request-available
>
> Consider the following table where empno is a unique key column.
> {code:sql}
> CREATE TABLE emp (
> empno INT,
> ename VARCHAR,
> job VARCHAR
> PRIMARY KEY (empno));
> {code}
> The results of RelMetadataQuery#getUniqueKeys for the following queries are
> as follows:
> {code:sql}
> SELECT empno FROM emp;
> {0}
> SELECT ename, empno FROM emp;
> {1}
> SELECT empno, ename, empno FROM emp;
> {0}, {2}, {0, 2}
> SELECT empno, ename, empno, empno FROM emp;
> {0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
> {code}
> When key columns are repeated in the project the result grows exponentially.
> This makes the unique key computation very expensive when there are many keys
> or when keys are repeated multiple times. The problem can lead to OOM errors
> and queries/rules hanging forever while trying to extract the keys.
> Observe, that the results above are not minimal so currently we are creating
> and returning a lot of redundant information.
> {noformat}
> {0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
> {noformat}
> If we know that \{0\}, \{2\}, and \{3\} are unique keys individually then any
> superset of those is also a unique key so it is sufficient to return just
> those.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)