Re: Questions about RelMdColumnUniqueness

Steven Phillips Fri, 07 Mar 2025 13:16:37 -0800

By join analysis, I assume you are referring to
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java#L336


The thing is, a join as the potential to make a unique column no longer
unique, as it could be an expanding join. e.g. joining these two tables on
b = c. Column a is unique on its own, but d is not.

a b      c   d
1 1      1   1
2 1      1   1

a b c  d
1 1 1 1
1 1 1 1
2 1 1 1
2 1 1 1

After the join, the key (a, d) is not unique. So the rule is not too
conservative.

On Fri, Mar 7, 2025 at 12:57 PM Ian Bertolacci
<[email protected]> wrote:

> Two comments:
> First: on our side, we’d like to know if *removing* the use of this
> predicate pullup information would have *negative* consequences for
> correctness.
> If it does not impact correctness: we can override the behavior to not
> consider this information, and move forwards.
>
> Second: a colleague pointed out that join’s notion of unique may also be
> too conservative, depending on what RelMdColumnUniqueness.areColumnsUnique
> is really supposed to be doing.
> If the intention is to check if the bitset of columns forms a unique key,
> then the join analysis is too conservative, since if there are unique
> columns from one side, and non-unique from the other, the whole key is
> considered not unique, even though the presence of *any* unique column
> makes the overall key unique. (Its worth pointing out that this is how
> Statisitcs.isKey works, which is how areColumnsUnique(TableScan) works if
> that RelOptTable does not provide a special uniqueness handler).
> If the intention is to check if *any* column in the bitset is itself
> unique, then the join analysis is also too conservative, for the same
> reasons.
> If the intention is to check if *all* column in the bitset is itself
> unique, then the join analysis is fine but the TableScan’s default analysis
> is incorrect (for reasons mentioned above).
>
> So there’s more than just this predicate pullup at play here.
>

Re: Questions about RelMdColumnUniqueness

Reply via email to