[jira] [Commented] (CALCITE-6640) RelMdUniqueKeys grows exponentially when key columns are repeated in projections

Mihai Budiu (Jira) Wed, 23 Oct 2024 15:35:17 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17892308#comment-17892308
 ]


Mihai Budiu commented on CALCITE-6640:
--------------------------------------

Indeed, the JavaDoc on getUniqueKeys says this:

{code:java}
     * Determines the set of unique minimal keys for this expression. A key is
     * represented as an {@link org.apache.calcite.util.ImmutableBitSet}, where
     * each bit position represents a 0-based output column ordinal.
     *
     * <p>Note that a unique key plus other columns is still unique.
     * Therefore, all columns are unique in a table with a unique key
     * consisting of the empty set, as is the case for zero-row and
     * single-row tables. The converse is not true: a table with all
     * columns unique does necessary have the empty set as a key -
     * that is never true with multi-row tables.
{code}

> RelMdUniqueKeys grows exponentially when key columns are repeated in 
> projections
> --------------------------------------------------------------------------------
>
>                 Key: CALCITE-6640
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6640
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>
> Consider the following table where empno is a unique key column.
> {code:sql}
> CREATE TABLE emp (
>  empno INT, 
>  ename VARCHAR, 
>  job VARCHAR
>  PRIMARY KEY (empno));
> {code}
> The results of RelMetadataQuery#getUniqueKeys for the following queries are 
> as follows:
> {code:sql}
> SELECT empno FROM emp;
> {0}
> SELECT ename, empno FROM emp;
> {1} 
> SELECT empno, ename, empno FROM emp;
> {0}, {2}, {0, 2}
> SELECT empno, ename, empno, empno FROM emp;
> {0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
> {code}
> When key columns are repeated in the project the result grows exponentially. 
> This makes the unique key computation very expensive when there are many keys 
> or when keys are repeated multiple times. The problem can lead to OOM errors 
> and queries/rules hanging forever while trying to extract the keys.
> Observe, that the results above are not minimal so currently we are creating 
> and returning a lot of redundant information.
> {noformat}
> {0}, {2}, {3}, {0, 2}, {0 3}, {2, 3}, {0, 2, 3}
> {noformat}
> If we know that \{0\}, \{2\}, and \{3\} are unique keys individually then any 
> superset of those is also a unique key so it is sufficient to return just 
> those.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6640) RelMdUniqueKeys grows exponentially when key columns are repeated in projections

Reply via email to