[
https://issues.apache.org/jira/browse/CALCITE-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17905227#comment-17905227
]
Stamatis Zampetakis commented on CALCITE-6704:
----------------------------------------------
I explored two slightly different approaches for restricting the number of keys
returned by the {{RelMdUniqueKeys}} handler.
The first approach (under
[PR#4089|https://github.com/apache/calcite/pull/4089]), treats the limit as an
implementation detail of the {{RelMdUniqueKeys}} handler. This approach affects
only the default metadata handler that resides in Calcite and does not have
much impact on other {{MetadataHandler}} implementations that derive
{{BuiltInMetadata.UniqueKeys}} which may exist in dependent projects. This
approach achieves the goal of restricting the number of returned keys and keeps
the behavior changes to a minimum. In this approach, the limit can be
configured only once, when registering the metadata provider, and cannot be
tuned per call to the {{RelMetadataQuery}} API.
The second approach (under
[https://github.com/zabetak/calcite/tree/CALCITE-6704-final-b]), treats the
limit as a parameter of {{{}BuiltInMetadata.UniqueKeys{}}}. This approach
affects all metadata handlers that implement the {{UniqueKeys}} API and comes
with notable breaking changes that will require code changes in dependent
projects that use this API. It allows more fine-grained tuning of the limit
parameter since it is accompanied also by new methods in {{RelMetadataQuery}}
API. Note, that we could avoid having a new {{Config}} interface and just add
an additional integer parameter in the existing methods of
{{BuiltInMetadata.UniqueKeys}} API but it would still lead to breaking changes.
In fact, given that the exponential logic may not be present in
{{MetadataHandler}} implemented in other projects keeping the limit
encapsulated in {{RelMdUniqueKeys}} feels natural. In addition, I get the
feeling that the limit is not really something to be tuned on a per call basis
but its more like a safety valve that we should rarely touch, if ever. For
these reasons, I heavily lean towards the first approach but if other feel
differently I am open to discuss more.
> Add limit parameter to restrict the result size of UniqueKeys metadata
> ----------------------------------------------------------------------
>
> Key: CALCITE-6704
> URL: https://issues.apache.org/jira/browse/CALCITE-6704
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.38.0
> Reporter: Stamatis Zampetakis
> Assignee: Stamatis Zampetakis
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.39.0
>
>
> In some cases, the number of unique keys grows exponentially for certain
> relational expressions.
> Consider a table with a composite key \{k1, k2, k3\} and the following query:
> {code:sql}
> SELECT k1, k1, k2, k2, k3, k3 FROM composite
> {code}
> The number of minimal unique keys for this query is 2^3:
> {noformat}
> {0, 2, 4}, {0, 3, 4}, {1, 2, 4}, {1, 3, 4}, {0, 2, 5}, {0, 3, 5}, {1, 2, 5},
> {1, 3, 5}
> {noformat}
> In this case, the size of *minimal keys* is exponential to the number of
> columns in the composite key. The number of keys is f(x,y) = x^y where _x_ is
> the number of repetitions of each column and _y_ is the number of columns in
> the composite key. The example is taken out from CALCITE-6640.
> A large number of unique keys may occur in other use-cases as well. Computing
> the entire result set may become prohibitively expensive and cause OOM and
> other failures.
> To prevent this from happening we propose to add a new parameter to the
> UniqueKeys interface allowing the users to specify a limit on the size of
> the result set.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)