Hi Jack,
Appreciate you taking the time to do the review and catching the
predicate with expression. I have changed the code and also added a
test case for it. I have fired the derbyall suite on my codeline to
make sure everything else runs smoothly. Attached is the updated patch
anyways.
thanks,
Mamta
On 5/11/05, Jack Klebanoff <[EMAIL PROTECTED]> wrote:
> The patch does not completely fix the problem. It does not handle the
> case where the exists table column is embedded in an expression. Try the
> following variation on the test select:
>
> select distinct q1."NO1" from IDEPT q1, IDEPT q2
> where ( q2."DISCRIM_DEPT" = 'HardwareDept')
> and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" <> ALL
> (select q3."NO1" from IDEPT q3 where ( ABS(q3."REPORTTO_NO") = q2."NO1")))
>
> Because q3."REPORTTO_NO" is inside a call to ABS the code added to
> FromList.returnsAtMostSingleRow does not see it.
>
> I would suggest using
>
> JBitSet referencedTables =
> and.getLeftOperand().getTablesReferenced();
> if( referencedTables.get( existsTableNumber))
> {
>
> predicatesTemp.removeElementAt(predicatesTempIndex);
> break;
> }
>
> instead of
> BinaryRelationalOperatorNode beon =
> (BinaryRelationalOperatorNode)
> and.getLeftOperand();
> ValueNode left = beon.getLeftOperand();
> ValueNode right = beon.getRightOperand();
>
> /* If left or right side of predicate refer to
> exists base table,
> then remove it */
> if ((left instanceof ColumnReference) &&
> ((ColumnReference) left).getTableNumber() ==
> existsTableNumber)
> {
>
> predicatesTemp.removeElementAt(predicatesTempIndex);
> break;
> }
> else if ((right instanceof ColumnReference) &&
> ((ColumnReference) right).getTableNumber()
> == existsTableNumber)
> {
>
> predicatesTemp.removeElementAt(predicatesTempIndex);
> break;
> }
>
> I have tried it out and it seems to work.
>
> Jack Klebanoff
>
> Mamta Satoor wrote:
>
> >Hi,
> >
> >I have a patch for this optimizer bug. Basically, the issue turned out
> >to be the logic for DISTINCT elimination. During the optimization
> >phase, if a query has DISTINCT clause, then impl.sql.compile.FromList
> >class's returnsAtMostSingleRow() method gets called. This method
> >returns true if the method concludes that DISTINCT in the query is
> >redundant (based on a complex logic that decides that the query is
> >going to return distinct rows on its own without the DISTINCT clause.
> >The details of the current logic for DISTINCT elimination can be found
> >in the comments at the method level.)
> >
> >For the query in question in this bug, the method returned true for
> >DISTINCT elimination which is wrong. The explanation is as follows.
> >
> >First of all, I was able to simplify the query reported in the bug to
> >following query.
> >select distinct q1."NO1" from IDEPT q1, IDEPT q2
> >where ( q2."DISCRIM_DEPT" = 'HardwareDept')
> >and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" <> ALL
> >(select q3."NO1" from IDEPT q3 where (q3."REPORTTO_NO" = q2."NO1")))
> >
> >This query gets converted to following during optimization
> >select distinct q1."NO1" from IDEPT q1, IDEPT q2
> >where ( q2."DISCRIM_DEPT" = 'HardwareDept')
> >and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and not exists (
> >(select q3."NO1" from IDEPT q3 where
> >( q3."REPORTTO_NO" = q2."NO1" and q3."NO1" = q1."NO1") ) ) ;
> >
> >This optimized query has 4 predicates associated with it
> >q3.reportto_no = q2.no1
> >q2.discrim_dept = 'HardwareDept'
> >q1.descrim_dept = 'SoftwareDept'
> >q1.no1 = q3.no1
> >
> >Next, on this optimized query(since it has DISTINCT clause in it), the
> >returnsAtMostSingleRow() method gets called. The method incorrectly
> >returns true indicating that DISTINCT can be eliminated. The reason
> >for this is that method is looking at predicates that belong to the
> >inside query with the exists clause (which is on table IDEPT q3) to
> >determine DISTINCT elimination for the outer level.
> >
> >The fix is that the predicates from the exists query, (in this
> >particular case, q3."NO1" = q1."NO1" and q3.reportto_no = q2.no1)
> >should not be considered when deciding elimination of DISTINCT in the
> >outer query. That is what the attached patch does.
> >
> >Hope this helps understand the problem and the proposed fix for it.
> >The files impacted by the change are as follows
> >svn stat
> >M java\engine\org\apache\derby\impl\sql\compile\FromList.java
> >M
> >java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
> >M
> >java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
> >
> >Please send in comments you may have. I have run the existing tests
> >and the patch didn't cause any failures.
> >
> >thanks,
> >Mamta
> >
> >
>
>
Index: java/engine/org/apache/derby/impl/sql/compile/FromList.java
===================================================================
--- java/engine/org/apache/derby/impl/sql/compile/FromList.java (revision
169642)
+++ java/engine/org/apache/derby/impl/sql/compile/FromList.java (working copy)
@@ -1040,7 +1040,7 @@
}
/**
- * Decrement (query block) level (0-based) for
+ * Decrement (query block) level (0-based) for
* all of the tables in this from list.
* This is useful when flattening a subquery.
*
@@ -1069,7 +1069,7 @@
}
}
-
+
/**
* This method is used for both subquery flattening and distinct
* elimination based on a uniqueness condition. For subquery
@@ -1079,7 +1079,7 @@
* any duplicates.
* This is true if every table in the from list is
* (a base table and the set of columns from the table that
- * are in equality comparisons with expressions that do not include
columns
+ * are in equality comparisons with expressions that do not include
columns
* from the same table is a superset of any unique index
* on the table) or an EXISTS FBT. In addition, at least 1 of the
tables
* in the list has a set of columns in equality comparisons with
expressions
@@ -1105,28 +1105,28 @@
* create an array of columns from the table(eqOuterCol)
* (this is used to determine that only one row will be
returned
* from a join)
- *
+ *
* if the current table is the table for the result columns
* set the result columns in the eqOuterCol and
tableColMap
* (if these columns are a superset of a unique
index and
* all joining tables result in only one row, the
* results will be distinct)
* go through all the predicates and update tableColMap
and
- * eqOuterCol with join columns and correlation variables,
+ * eqOuterCol with join columns and correlation variables,
* parameters and constants
* since setting constants, correlation variables and
parameters,
- * reduces the number of columns required for uniqueness
in a
+ * reduces the number of columns required for uniqueness
in a
* multi-column index, they are set for all the tables (if
the
* table is not the result table, in this case only the
column of the
* result table is set)
* join columns are just updated for the column in the row
of the
* joining table.
- *
- * check if the marked columns in tableColMap are a
superset of a unique
- * index
+ *
+ * check if the marked columns in tableColMap are a
superset of a unique
+ * index
* (This means that the join will only produce 1
row when joined
* with 1 row of another table)
- * check that there is a least one table for which the
columns in
+ * check that there is a least one table for which the
columns in
* eqOuterCol(i.e. constant values) are a superset
of a unique index
* (This quarantees that there will be only one
row selected
* from this table).
@@ -1134,8 +1134,8 @@
* Once all tables have been evaluated, check that all the tables
can be
* joined by unique index or will have only one row
*
- *
*
+ *
* @param rcl If non-null, the RCL from the
query block.
* If non-null for
subqueries, then entry can
* be considered
as part of an = comparison.
@@ -1150,8 +1150,8 @@
*
* @exception StandardException Thrown on error
*/
- boolean returnsAtMostSingleRow(ResultColumnList rcl,
- ValueNode
whereClause,
+ boolean returnsAtMostSingleRow(ResultColumnList rcl,
+ ValueNode
whereClause,
PredicateList wherePredicates,
DataDictionary dd)
throws StandardException
@@ -1160,6 +1160,13 @@
int[] tableNumbers;
ColumnReference additionalCR = null;
+ PredicateList predicatesTemp;
+ predicatesTemp = (PredicateList) getNodeFactory().getNode(
+ C_NodeTypes.PREDICATE_LIST, getContextManager());
+ int wherePredicatesSize = wherePredicates.size();
+ for (int index = 0; index < wherePredicatesSize; index++)
+
predicatesTemp.addPredicate((Predicate)wherePredicates.elementAt(index));
+
/* When considering subquery flattening, we are interested
* in the 1st (and only) entry in the RCL. (The RCL will be
* null if result column is not of interest for subquery
flattening.)
@@ -1193,6 +1200,77 @@
{
return false;
}
+ FromBaseTable fbt = (FromBaseTable)
prn.getChildResult();
+ //Following for loop code is to take care of Derby-251
(DISTINCT returns
+ //duplicate rows).
+ //Derby-251 returned duplicate rows because we were
looking at predicates
+ //that belong to existsTable to determine DISTINCT
elimination
+ //
+ //(Check method level comments to understand DISTINCT
elimination rules.)
+ //
+ //For one specific example, consider the query below
+ //select distinct q1."NO1" from IDEPT q1, IDEPT q2
+ //where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+ //and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and (
q1."NO1" <> ALL
+ //(select q3."NO1" from IDEPT q3 where
(q3."REPORTTO_NO" = q2."NO1")))
+ //(select q3."NO1" from IDEPT q3 where (
ABS(q3."REPORTTO_NO") = q2."NO1")))
+ //
+ //Table IDEPT in the query above has a primary key
defined on column "NO1"
+ //This query gets converted to following during
optimization
+ //
+ //select distinct q1."NO1" from IDEPT q1, IDEPT q2
+ //where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+ //and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and not
exists (
+ //(select q3."NO1" from IDEPT q3 where
+ //( ( ABS(q3."REPORTTO_NO") = q2."NO1") and q3."NO1"
= q1."NO1") ) ) ;
+ //
+ //For the optimized query above, Derby generates
following predicates.
+ //ABS(q3.reportto_no) = q2.no1
+ //q2.discrim_dept = 'HardwareDept'
+ //q1.descrim_dept = 'SoftwareDept'
+ //q1.no1 = q3.no1
+ //The predicate ABS(q3."NO1") = q1."NO1" should not be
considered when trying
+ //to determine if q1 in the outer query has equality
comparisons.
+ //Similarly, the predicate q3.reportto_no = q2.no1
should not be
+ //considered when trying to determine if q2 in the
outer query has
+ //equality comparisons. To achieve this, predicates
based on exists base
+ //table q3 (the first and the last predicate) should be
removed while
+ //evaluating outer query for uniqueness.
+ //
+ if (fbt.getExistsBaseTable())
+ {
+ int existsTableNumber = fbt.getTableNumber();
+ int predicatesTempSize = predicatesTemp.size();
+ for (int predicatesTempIndex =
predicatesTempSize-1;
+ predicatesTempIndex >= 0;
predicatesTempIndex--)
+ {
+ AndNode topAndNode = (AndNode)
+ ((Predicate)
predicatesTemp.elementAt(predicatesTempIndex)).getAndNode();
+
+ for (ValueNode whereWalker =
topAndNode; whereWalker instanceof AndNode;
+ whereWalker = ((AndNode)
whereWalker).getRightOperand())
+ {
+ // See if this is a candidate =
+ AndNode and = (AndNode)
whereWalker;
+
+ //we only need to worry about
equality predicates because only those
+ //predicates are considered
during DISTINCT elimination.
+ if
(!and.getLeftOperand().isRelationalOperator() ||
+
!(((RelationalOperator)(and.getLeftOperand())).getOperator() ==
+
RelationalOperator.EQUALS_RELOP))
+ {
+ continue;
+ }
+
+ JBitSet referencedTables =
and.getLeftOperand().getTablesReferenced();
+ if
(referencedTables.get(existsTableNumber))
+ {
+
predicatesTemp.removeElementAt(predicatesTempIndex);
+ break;
+ }
+ }
+ }
+ }
}
/* Build an array of tableNumbers from this query block.
@@ -1245,7 +1323,7 @@
/* Now see if there are any equality conditions
* of interest in the where predicates.
*/
- wherePredicates.checkTopPredicatesForEqualsConditions(
+ predicatesTemp.checkTopPredicatesForEqualsConditions(
tableNumber,
eqOuterCols, tableNumbers,
tableColMap[index], resultColTable);
@@ -1298,7 +1376,7 @@
/* unique key join -
exists tables already marked as
* 1 row - so don't
need to look at them
*/
- if (!oneRow[i] &&
tableColMap[i][index].get(0))
+ if (!oneRow[i] &&
tableColMap[i][index].get(0))
{
oneRow[i] =
true;
foundOneRow =
true;
Index:
java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
===================================================================
---
java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
(revision 169642)
+++
java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
(working copy)
@@ -12,6 +12,11 @@
create unique index three_c1 on three(c1);
create table four(c1 int, c2 int, c3 int, c4 int, c5 int);
create unique index four_c1c3 on four(c1, c3);
+CREATE TABLE "APP"."IDEPT" ("DISCRIM_DEPT" VARCHAR(32), "NO1" INTEGER NOT
NULL,
+"NAME" VARCHAR(50), "AUDITOR_NO" INTEGER, "REPORTTO_NO" INTEGER,
"HARDWAREASSET"
+ VARCHAR(15), "SOFTWAREASSET" VARCHAR(15));
+-- primary/unique
+ALTER TABLE "APP"."IDEPT" ADD CONSTRAINT "PK_IDEPT" PRIMARY KEY ("NO1");
insert into one values (1, 1, 1, 1, 1);
insert into one values (2, 1, 1, 1, 1);
@@ -51,6 +56,12 @@
insert into four values (3, 1, 2, 1, 1);
insert into four values (3, 1, 3, 1, 1);
+insert into idept values ('Dept', 1, 'Department1', null, null, null, null);
+insert into idept values ('HardwareDept', 2, 'Department2', 25, 1,
'hardwareaset2', null);
+insert into idept values ('HardwareDept', 3, 'Department3', 25, 2,
'hardwareaset3', null);
+insert into idept values ('SoftwareDept', 4, 'Department4', 25, 1, null,
'softwareasset4');
+insert into idept values ('SoftwareDept', 5, 'Department5', 30, 4, null,
'softwareasset5');
+
call SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
maximumdisplaywidth 20000;
@@ -61,6 +72,24 @@
-- Following runtime statistics output should have Distinct Scan in it
values SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
+--Derby251 Distinct should not get eliminated for following query
+--because there is no equality condition on unique column of table
+--in the outside query
+select distinct q1."NO1", q1."NAME", q1."AUDITOR_NO",
+q1."REPORTTO_NO", q1."DISCRIM_DEPT", q1."SOFTWAREASSET" from
+IDEPT q1, IDEPT q2 where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+ and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1"
+<> ALL ( select q3."NO1" from IDEPT q3 where ( (
+q3."DISCRIM_DEPT" = 'Dept') or ( q3."DISCRIM_DEPT" =
+'HardwareDept') or ( q3."DISCRIM_DEPT" = 'SoftwareDept') )
+and ( q3."REPORTTO_NO" = q2."NO1") ) ) ;
+--
+--Another test case of Derby251 where the exists table column is embedded in
an expression.
+select distinct q1."NO1" from IDEPT q1, IDEPT q2
+where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" <> ALL
+(select q3."NO1" from IDEPT q3 where ( ABS(q3."REPORTTO_NO") = q2."NO1")));
+
-- result ordering is not guaranteed, but order by clause will change how
-- distinct is executed. So test by retrieving data into a temp table and
-- return results ordered after making sure the query was executed as expected.
@@ -170,3 +199,4 @@
drop table two;
drop table three;
drop table four;
+drop table idept;
Index:
java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
===================================================================
---
java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
(revision 169642)
+++
java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
(working copy)
@@ -19,6 +19,13 @@
0 rows inserted/updated/deleted
ij> create unique index four_c1c3 on four(c1, c3);
0 rows inserted/updated/deleted
+ij> CREATE TABLE "APP"."IDEPT" ("DISCRIM_DEPT" VARCHAR(32), "NO1" INTEGER NOT
NULL,
+"NAME" VARCHAR(50), "AUDITOR_NO" INTEGER, "REPORTTO_NO" INTEGER,
"HARDWAREASSET"
+ VARCHAR(15), "SOFTWAREASSET" VARCHAR(15));
+0 rows inserted/updated/deleted
+ij> -- primary/unique
+ALTER TABLE "APP"."IDEPT" ADD CONSTRAINT "PK_IDEPT" PRIMARY KEY ("NO1");
+0 rows inserted/updated/deleted
ij> insert into one values (1, 1, 1, 1, 1);
1 row inserted/updated/deleted
ij> insert into one values (2, 1, 1, 1, 1);
@@ -87,6 +94,16 @@
1 row inserted/updated/deleted
ij> insert into four values (3, 1, 3, 1, 1);
1 row inserted/updated/deleted
+ij> insert into idept values ('Dept', 1, 'Department1', null, null, null,
null);
+1 row inserted/updated/deleted
+ij> insert into idept values ('HardwareDept', 2, 'Department2', 25, 1,
'hardwareaset2', null);
+1 row inserted/updated/deleted
+ij> insert into idept values ('HardwareDept', 3, 'Department3', 25, 2,
'hardwareaset3', null);
+1 row inserted/updated/deleted
+ij> insert into idept values ('SoftwareDept', 4, 'Department4', 25, 1, null,
'softwareasset4');
+1 row inserted/updated/deleted
+ij> insert into idept values ('SoftwareDept', 5, 'Department5', 30, 4, null,
'softwareasset5');
+1 row inserted/updated/deleted
ij> call SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
0 rows inserted/updated/deleted
ij> maximumdisplaywidth 20000;
@@ -143,6 +160,31 @@
None
next qualifiers:
None
+ij> --Derby251 Distinct should not get eliminated for following query
+--because there is no equality condition on unique column of table
+--in the outside query
+select distinct q1."NO1", q1."NAME", q1."AUDITOR_NO",
+q1."REPORTTO_NO", q1."DISCRIM_DEPT", q1."SOFTWAREASSET" from
+IDEPT q1, IDEPT q2 where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+ and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1"
+<> ALL ( select q3."NO1" from IDEPT q3 where ( (
+q3."DISCRIM_DEPT" = 'Dept') or ( q3."DISCRIM_DEPT" =
+'HardwareDept') or ( q3."DISCRIM_DEPT" = 'SoftwareDept') )
+and ( q3."REPORTTO_NO" = q2."NO1") ) ) ;
+NO1 |NAME |AUDITOR_NO
|REPORTTO_NO|DISCRIM_DEPT |SOFTWAREASSET
+---------------------------------------------------------------------------------------------------------------------------------------
+4 |Department4 |25 |1
|SoftwareDept |softwareasset4
+5 |Department5 |30 |4
|SoftwareDept |softwareasset5
+ij> --
+--Another test case of Derby251 where the exists table column is embedded in
an expression.
+select distinct q1."NO1" from IDEPT q1, IDEPT q2
+where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" <> ALL
+(select q3."NO1" from IDEPT q3 where ( ABS(q3."REPORTTO_NO") = q2."NO1")));
+NO1
+-----------
+4
+5
ij> -- result ordering is not guaranteed, but order by clause will change how
-- distinct is executed. So test by retrieving data into a temp table and
-- return results ordered after making sure the query was executed as expected.
@@ -2319,4 +2361,6 @@
0 rows inserted/updated/deleted
ij> drop table four;
0 rows inserted/updated/deleted
+ij> drop table idept;
+0 rows inserted/updated/deleted
ij>