Ian, I think you're right, this looks like an issue. Maybe your scenario is not the most typical (I think usually the problematic Filter would be pushed down inside the Join, so the areColumnsUnique situation would be avoided), but it is a plausible one. Can you please create a ticket to track it? Could you provide a unit test showing your problematic example?
IMO we cannot simply remove the pullup improvement introduced in CALCITE-3428 [1], otherwise we'd miss some valid optimizations (i.e. the use cases presented in that ticket). However, it seems this optimization can be counter-productive in cases like yours. I have the impression that the ColumnUniqueness logic should be "if the original columns are unique, then return true; otherwise try again with the decorated columns with the pullup info (if they are different from the original columns)". I think that would solve your issue without breaking any of the cases from CALCITE-3428. Moreover, this logic could probably be centralized into a single location (RelMetadataQuery#areColumnsUnique [2]), rather than copy-pasted in every single method of RelMdColumnUniqueness (as it happens today with all the calls to decorateWithConstantColumnsFromPredicate [3]), i.e. something like [4] . [1] https://issues.apache.org/jira/browse/CALCITE-3428 [2] https://github.com/apache/calcite/blob/48648b391367eeb7910e934cc73dabbe1d7370c0/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L604 [3] https://github.com/apache/calcite/blob/48648b391367eeb7910e934cc73dabbe1d7370c0/core/src/main/java/org/apache/calcite/rel/metadata/RelMdColumnUniqueness.java#L503 [4] https://github.com/apache/calcite/pull/4232 On Fri, Mar 7, 2025 at 10:52 PM Steven Phillips <ste...@dremio.com.invalid> wrote: > > > > This gets back to my “What is this supposed to do?” > > > I can confidently say it's the first one: "Is this set of keys, when taken > together, unique"? > > Sorry, earlier, I guess I misunderstood what you meant by join analysis. > You are referring to the predicate-constant thing in the context of joins. > I agree, that seems wrong to me, and in both join and filter cases, it > makes the uniqueness test more conservative than it needs to be. > > On Fri, Mar 7, 2025 at 2:28 PM Ian Bertolacci > <ian.bertola...@workday.com.invalid> wrote: > > > That example makes sense but isn’t directly related to the issue here. > > In this example: `select T1.id, count(*) from T1 inner join T2 on > > T1.foreignKey = T2.ID where T2.foo = 1234 group by T1.id` > > AggregateRemoveRule wants to know if `T1.id` after the filter is unique. > > The answer should be “yes” because T1 is not expanded by the join (in > these > > examples, all ID fields are unique at the TableScan level). > > For example: > > T1 > > ID | foreignKey > > 1 | 10 > > 2 | 10 > > 3 | 20 > > 4 | 20 > > > > T2 > > ID (lets ignore foo) > > 10 > > 20 > > > > T1 inner join T2 on T1.foreignKey = T2.id > > T1.ID | T2.ID > > 1 | 10 > > 2 | 10 > > 3 | 20 > > 4 | 20 > > > > And the analyser is already aware of this. > > I ask the join “Is T1.id unique” it will say “yes”; and if I ask “is > T2.id > > unique” it will say “no”, because the non-uniqueness of the foreign key > > field negates any uniqueness on the opposite/ primary key side. > > > > But if you ask the filter “is T1.id unique” it will ask the join “is > > {T1.id T2.foo}” unique. > > The join looks at T1.id and sees that its unique, then looks at T2.foo > and > > sees that its not unique. > > And for join, it “ands” these together, so if any non-unique column > > exists, then it claims that the whole key-set is not unique. > > > > This gets back to my “What is this supposed to do?” > > Is it answering the question “Is this set of keys, when taken together, > > unique?” or is it answering the question “are *all* of these keys > > *independently* unique?” > > If it’s the first, then the join analysis is too conservative, it’s the > > second, then the table scan analysis is incorrect (I think) > > > > To me, I feel like the analysis could be as simple as “Are any of these > > columns unique after the join?” if yes, then the key set as a whole is > > unique. > > > > -Ian > > > > >