[ 
https://issues.apache.org/jira/browse/CALCITE-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17934047#comment-17934047
 ] 

Ian Bertolacci commented on CALCITE-6882:
-----------------------------------------

Some pertinant exerpts from the email thread:

>From Ian Bertolacci:
{quote}In this example: `select T1.id, count(*) from T1 inner join T2 on 
T1.foreignKey = T2.ID where T2.foo = 1234 group by T1.id`
AggregateRemoveRule wants to know if `T1.id` after the filter is unique. The 
answer should be “yes” because T1 is not expanded by the join (in these 
examples, all ID fields are unique at the TableScan level).
For example:
{{T1}}
{{ID | foreignKey}}
{{ 1 | 10}}
{{ 2 | 10}}
{{ 3 | 20}}
{{ 4 | 20}}

{{T2}}
{{ID (lets ignore foo)}}
{{10}}
{{20}}

{{T1 inner join T2 on T1.foreignKey = T2.id}}
{{T1.ID | T2.ID}}
{{    1 | 10}}
{{    2 | 10}}
{{    3 | 20}}
{{    4 | 20}}

And the analyser is already aware of this.
I ask the join “Is T1.id unique” it will say “yes”; and if I ask “is T2.id 
unique” it will say “no”, because the non-uniqueness of the foreign key field 
negates any uniqueness on the opposite/ primary key side.

But if you ask the filter “is T1.id unique” it will ask the join “is \{T1.id 
T2.foo}” unique.
The join looks at T1.id and sees that its unique, then looks at T2.foo and sees 
that its not unique.
And for join, it “ands” these together, so if any non-unique column exists, 
then it claims that the whole key-set is not unique.
{quote}
 

>From Steven Phillips
{quote}> ... “What is [areColumnsUnique] supposed to do?"  Is it answering the 
question “Is this set of keys, when taken together, unique?” or is it answering 
the question “are *all* of these keys *independently* unique?”

I can confidently say it's the first one: "Is this set of keys, when taken
together, unique"
{quote}

> RelMdColumnUniqueness incorrectly claims fields are not unique if constant 
> refinement occurs in a node above join
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-6882
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6882
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Ian Bertolacci
>            Priority: Major
>
> There is some overly conservative logic in determining uniqueness of columns 
> in RelNodes which have joins beneath them (not necessarily as children)
> In this case, this occurs when a filter condition involving the non-unique 
> side of a join can engage in predicate pullup/refinement.
> Demonstration: inner join between emp and dept
> testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a 
> filter after the filter dep.name using `=` (this demonstrates the bug)
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a 
> filter after the filter dep.name using `!=`
> In all three cases, the field position 0 (empno) should be listed as unique.
> But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports 
> it as not-unique.
> Specifically, 
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter asks if 
> column 0 is unique after the project, which asks if column 0 is unique after 
> the filter; the pullup then adds column 10 to the bitset for unique columns. 
> Column 10 is not unique, and so the result is that column 0 is marked as not 
> unique. 
> In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the 
> `!=` operator precludes the predicate pullup/refinement, and so the bitset is 
> unmodified, and the analysis correctly marks column 0 after the join as 
> unique.
> it seems like there is a combination of issues:
>  # Join being too conservative
>  # Incorrect processing for columns which can engage in predicate 
> pullup/refinement.
> [Spawned by discussion in mailing 
> list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33]
> Tests inside RelMetadataTest:
> {code:java}
> @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() {
>   /** Project(EMPNO=[$0], DEPTNO=[$9])
>     *   Join(condition=[=($7, $9)], joinType=[inner])
>     *     TableScan(table=[[CATALOG, SALES, EMP]])
>     *     TableScan(table=[[CATALOG, SALES, DEPT]])
>     */
>   sql(
>       "select emp.empno, dept.deptno\n"
>           + "from emp\n"
>           + "  inner join dept\n"
>           + "on emp.deptno = dept.deptno"
>   )
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }
> @Test void 
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() {
>   /** Project(EMPNO=[$0], DEPTNO=[$9])
>     *   Filter(condition=[=($10, 'foo')])
>     *     Join(condition=[=($7, $9)], joinType=[inner])
>     *       TableScan(table=[[CATALOG, SALES, EMP]])
>     *       TableScan(table=[[CATALOG, SALES, DEPT]])
>     */
>   sql(
>     "select emp.empno, dept.deptno\n"
>         + "from emp\n"
>         + "  inner join dept\n"
>         + "on emp.deptno = dept.deptno\n"
>         + "where dept.name = 'foo'"
>   )
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }
> @Test void 
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() {
>   /** Project(EMPNO=[$0], DEPTNO=[$9])
>     *   Filter(condition=[!=($10, 'foo')])
>     *     Join(condition=[=($7, $9)], joinType=[inner])
>     *       TableScan(table=[[CATALOG, SALES, EMP]])
>     *       TableScan(table=[[CATALOG, SALES, DEPT]])
>     */
>   sql(
>       "select emp.empno, dept.deptno\n"
>           + "from emp\n"
>           + "  inner join dept\n"
>           + "on emp.deptno = dept.deptno\n"
>           + "where dept.name <> 'foo'"
>   )
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
>       .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to