[ 
https://issues.apache.org/jira/browse/CALCITE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905227#comment-17905227
 ] 

Stamatis Zampetakis commented on CALCITE-6704:
----------------------------------------------

I explored two slightly different approaches for restricting the number of keys 
returned by the {{RelMdUniqueKeys}} handler.

The first approach (under 
[PR#4089|https://github.com/apache/calcite/pull/4089]), treats the limit as an 
implementation detail of the {{RelMdUniqueKeys}} handler. This approach affects 
only the default metadata handler that resides in Calcite and does not have 
much impact on other {{MetadataHandler}} implementations that derive 
{{BuiltInMetadata.UniqueKeys}} which may exist in dependent projects. This 
approach achieves the goal of restricting the number of returned keys and keeps 
the behavior changes to a minimum. In this approach, the limit can be 
configured only once, when registering the metadata provider, and cannot be 
tuned per call to the {{RelMetadataQuery}} API.

The second approach (under 
[https://github.com/zabetak/calcite/tree/CALCITE-6704-final-b]), treats the 
limit as a parameter of {{{}BuiltInMetadata.UniqueKeys{}}}. This approach 
affects all metadata handlers that implement the {{UniqueKeys}} API and comes 
with notable breaking changes that will require code changes in dependent 
projects that use this API. It allows more fine-grained tuning of the limit 
parameter since it is accompanied also by new methods in {{RelMetadataQuery}} 
API. Note, that we could avoid having a new {{Config}} interface and just add 
an additional integer parameter in the existing methods of 
{{BuiltInMetadata.UniqueKeys}} API but it would still lead to breaking changes.

In fact, given that the exponential logic may not be present in 
{{MetadataHandler}} implemented in other projects keeping the limit 
encapsulated in {{RelMdUniqueKeys}} feels natural. In addition, I get the 
feeling that the limit is not really something to be tuned on a per call basis 
but its more like a safety valve that we should rarely touch, if ever. For 
these reasons, I heavily lean towards the first approach but if other feel 
differently I am open to discuss more.

> Add limit parameter to restrict the result size of UniqueKeys metadata
> ----------------------------------------------------------------------
>
>                 Key: CALCITE-6704
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6704
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.38.0
>            Reporter: Stamatis Zampetakis
>            Assignee: Stamatis Zampetakis
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.39.0
>
>
> In some cases, the number of unique keys grows exponentially for certain 
> relational expressions.
> Consider a table with a composite key \{k1, k2, k3\} and the following query:
> {code:sql}
> SELECT k1, k1, k2, k2, k3, k3 FROM composite
> {code}
> The number of minimal unique keys for this query is 2^3:
> {noformat}
> {0, 2, 4}, {0, 3, 4}, {1, 2, 4}, {1, 3, 4}, {0, 2, 5}, {0, 3, 5}, {1, 2, 5}, 
> {1, 3, 5}
> {noformat}
> In this case, the size of *minimal keys* is exponential to the number of 
> columns in the composite key. The number of keys is f(x,y) = x^y where _x_ is 
> the number of repetitions of each column and _y_ is the number of columns in 
> the composite key. The example is taken out from CALCITE-6640.
> A large number of unique keys may occur in other use-cases as well. Computing 
> the entire result set may become prohibitively expensive and cause OOM and 
> other failures. 
> To prevent this from happening we propose to add a new  parameter to the 
> UniqueKeys  interface allowing the users to specify a limit on the size of 
> the result set.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to