[
https://issues.apache.org/jira/browse/CALCITE-6882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ian Bertolacci updated CALCITE-6882:
------------------------------------
Description:
There is some overly conservative logic in determining uniqueness of columns in
RelNodes which have joins beneath them (not necessarily as children)
In this case, this occurs when a filter condition involving the non-unique side
of a join can engage in predicate pullup/refinement.
Demonstration: inner join between emp and dept
testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join
testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a filter
after the filter dep.name using `=` (this demonstrates the bug)
testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a filter
after the filter dep.name using `!=`
In all three cases, the field position 0 (empno) should be listed as unique.
But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports
it as not-unique.
Specifically, testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter
asks if column 0 is unique after the project, which asks if column 0 is unique
after the filter; the pullup then adds column 10 to the bitset for unique
columns. Column 10 is not unique, and so the result is that column 0 is marked
as not unique.
In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the `!=`
operator precludes the predicate pullup/refinement, and so the bitset is
unmodified, and the analysis correctly marks column 0 after the join as unique.
it seems like there is a combination of issues:
# Join being too conservative
# Incorrect processing for columns which can engage in predicate
pullup/refinement.
[Spawned by discussion in mailing
list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33]
Tests inside RelMetadataTest:
{code:java}
@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() {
/** Project(EMPNO=[$0], DEPTNO=[$9])
* Join(condition=[=($7, $9)], joinType=[inner])
* TableScan(table=[[CATALOG, SALES, EMP]])
* TableScan(table=[[CATALOG, SALES, DEPT]])
*/
sql(
"select emp.empno, dept.deptno\n"
+ "from emp\n"
+ " inner join dept\n"
+ "on emp.deptno = dept.deptno"
)
.assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
.assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}
@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() {
/** Project(EMPNO=[$0], DEPTNO=[$9])
* Filter(condition=[=($10, 'foo')])
* Join(condition=[=($7, $9)], joinType=[inner])
* TableScan(table=[[CATALOG, SALES, EMP]])
* TableScan(table=[[CATALOG, SALES, DEPT]])
*/
sql(
"select emp.empno, dept.deptno\n"
+ "from emp\n"
+ " inner join dept\n"
+ "on emp.deptno = dept.deptno\n"
+ "where dept.name = 'foo'"
)
.assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
.assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}
@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() {
/** Project(EMPNO=[$0], DEPTNO=[$9])
* Filter(condition=[!=($10, 'foo')])
* Join(condition=[=($7, $9)], joinType=[inner])
* TableScan(table=[[CATALOG, SALES, EMP]])
* TableScan(table=[[CATALOG, SALES, DEPT]])
*/
sql(
"select emp.empno, dept.deptno\n"
+ "from emp\n"
+ " inner join dept\n"
+ "on emp.deptno = dept.deptno\n"
+ "where dept.name <> 'foo'"
)
.assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
.assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}{code}
was:
There is some overly conservative logic in determining uniqueness of columns in
RelNodes which have joins beneath them (not necessarily as children)
In this case, this occurs when a filter condition involving the non-unique side
of a join can engage in predicate pullup/refinement.
Demonstration: inner join between emp and dept
testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join
testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a filter
after the filter dep.name using `=` (this demonstrates the bug)
testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a filter
after the filter dep.name using `!=`
In all three cases, the field position 0 (empno) should be listed as unique.
But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports
it as not-unique.
Specifically, testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter
asks if column 0 is unique after the project, which asks if column 0 is unique
after the filter; the pullup then adds column 10 to the bitset for unique
columns. Column 10 is not unique, and so the result is that column 0 is marked
as not unique.
In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the `!=`
operator precludes the predicate pullup/refinement, and so the bitset is
unmodified, and the analysis correctly marks column 0 after the join as unique.
it seems like there is a combination of issues:
# Join being too conservative
# Incorrect processing for columns which can engage in predicate
pullup/refinement.
[Spawned by discussion in mailing
list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33]
Tests inside RelMetadataTest:
{code:java}
@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() {
/** Project(EMPNO=[$0], DEPTNO=[$9])
* Join(condition=[=($7, $9)], joinType=[inner])
* TableScan(table=[[CATALOG, SALES, EMP]])
* TableScan(table=[[CATALOG, SALES, DEPT]])
*/
sql(
"select emp.empno, dept.deptno\n"
+ "from emp\n"
+ " inner join dept\n"
+ "on emp.deptno = dept.deptno"
)
.assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
.assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}
@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() {
/** Project(EMPNO=[$0], DEPTNO=[$9])
* Filter(condition=[=($10, 'foo')])
* Join(condition=[=($7, $9)], joinType=[inner])
* TableScan(table=[[CATALOG, SALES, EMP]])
* TableScan(table=[[CATALOG, SALES, DEPT]])
*/
sql(
"select emp.empno, dept.deptno\n"
+ "from emp\n"
+ " inner join dept\n"
+ "on emp.deptno = dept.deptno\n"
+ "where dept.name = 'foo'"
)
.assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
.assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}
@Test void testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() {
/** Project(EMPNO=[$0], DEPTNO=[$9])
* Filter(condition=[!=($10, 'foo')])
* Join(condition=[=($7, $9)], joinType=[inner])
* TableScan(table=[[CATALOG, SALES, EMP]])
* TableScan(table=[[CATALOG, SALES, DEPT]])
*/
sql(
"select emp.empno, dept.deptno\n"
+ "from emp\n"
+ " inner join dept\n"
+ "on emp.deptno = dept.deptno\n"
+ "where dept.name <> 'foo'"
)
.assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
.assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
}{code}
> RelMdColumnUniqueness incorrectly claims fields are not unique if constant
> refinement occurs in a node above join
> -----------------------------------------------------------------------------------------------------------------
>
> Key: CALCITE-6882
> URL: https://issues.apache.org/jira/browse/CALCITE-6882
> Project: Calcite
> Issue Type: Bug
> Reporter: Ian Bertolacci
> Priority: Major
>
> There is some overly conservative logic in determining uniqueness of columns
> in RelNodes which have joins beneath them (not necessarily as children)
> In this case, this occurs when a filter condition involving the non-unique
> side of a join can engage in predicate pullup/refinement.
> Demonstration: inner join between emp and dept
> testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter has no filter after the join
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter has a
> filter after the filter dep.name using `=` (this demonstrates the bug)
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter has a
> filter after the filter dep.name using `!=`
> In all three cases, the field position 0 (empno) should be listed as unique.
> But testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter reports
> it as not-unique.
> Specifically,
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter asks if
> column 0 is unique after the project, which asks if column 0 is unique after
> the filter; the pullup then adds column 10 to the bitset for unique columns.
> Column 10 is not unique, and so the result is that column 0 is marked as not
> unique.
> In testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter, the
> `!=` operator precludes the predicate pullup/refinement, and so the bitset is
> unmodified, and the analysis correctly marks column 0 after the join as
> unique.
> it seems like there is a combination of issues:
> # Join being too conservative
> # Incorrect processing for columns which can engage in predicate
> pullup/refinement.
> [Spawned by discussion in mailing
> list|https://lists.apache.org/thread/fmqyzokcqzv7vzq8lvg35h9opf5kwo33]
> Tests inside RelMetadataTest:
> {code:java}
> @Test void testUniqueKeysOnInnerJoinOnKeysWithParentNoFilter() {
> /** Project(EMPNO=[$0], DEPTNO=[$9])
> * Join(condition=[=($7, $9)], joinType=[inner])
> * TableScan(table=[[CATALOG, SALES, EMP]])
> * TableScan(table=[[CATALOG, SALES, DEPT]])
> */
> sql(
> "select emp.empno, dept.deptno\n"
> + "from emp\n"
> + " inner join dept\n"
> + "on emp.deptno = dept.deptno"
> )
> .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
> .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }
> @Test void
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueInclusiveFilter() {
> /** Project(EMPNO=[$0], DEPTNO=[$9])
> * Filter(condition=[=($10, 'foo')])
> * Join(condition=[=($7, $9)], joinType=[inner])
> * TableScan(table=[[CATALOG, SALES, EMP]])
> * TableScan(table=[[CATALOG, SALES, DEPT]])
> */
> sql(
> "select emp.empno, dept.deptno\n"
> + "from emp\n"
> + " inner join dept\n"
> + "on emp.deptno = dept.deptno\n"
> + "where dept.name = 'foo'"
> )
> .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
> .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }
> @Test void
> testUniqueKeysOnInnerJoinOnKeysWithParentNonUniqueExclusiveFilter() {
> /** Project(EMPNO=[$0], DEPTNO=[$9])
> * Filter(condition=[!=($10, 'foo')])
> * Join(condition=[=($7, $9)], joinType=[inner])
> * TableScan(table=[[CATALOG, SALES, EMP]])
> * TableScan(table=[[CATALOG, SALES, DEPT]])
> */
> sql(
> "select emp.empno, dept.deptno\n"
> + "from emp\n"
> + " inner join dept\n"
> + "on emp.deptno = dept.deptno\n"
> + "where dept.name <> 'foo'"
> )
> .assertThatAreColumnsUnique(ImmutableBitSet.of(0), is(true))
> .assertThatAreColumnsUnique(ImmutableBitSet.of(1), is(false));
> }{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)