Thanks for committing the patch. Also, the tests ran fine with no
failure with the patch.
Mamta
On 5/11/05, Satheesh Bandaram <[EMAIL PROTECTED]> wrote:
> Submitted this patch. Thanks for addressing the review comments quickly.
>
> Satheesh
>
> Sending
> java\engine\org\apache\derby\impl\sql\compile\FromList.java
> Sending
> java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
> Sending
> java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
> Transmitting file data ...
> Committed revision 169735.
>
> Mamta Satoor wrote:
>
> Hi Jack,
Appreciate you taking the time to do the review and catching
> the
predicate with expression. I have changed the code and also added a
test
> case for it. I have fired the derbyall suite on my codeline to
make sure
> everything else runs smoothly. Attached is the updated
> patch
anyways.
thanks,
Mamta
On 5/11/05, Jack Klebanoff
> <[EMAIL PROTECTED]> wrote:
> The patch does not completely fix the problem. It does not handle the
case
> where the exists table column is embedded in an expression. Try
> the
following variation on the test select:
select distinct q1."NO1" from
> IDEPT q1, IDEPT q2
where ( q2."DISCRIM_DEPT" = 'HardwareDept')
and (
> q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" <> ALL
(select q3."NO1"
> from IDEPT q3 where ( ABS(q3."REPORTTO_NO") = q2."NO1")))
Because
> q3."REPORTTO_NO" is inside a call to ABS the code added
> to
FromList.returnsAtMostSingleRow does not see it.
I would suggest using
> JBitSet referencedTables =
and.getLeftOperand().getTablesReferenced();
if(
> referencedTables.get( existsTableNumber))
> {
predicatesTemp.removeElementAt(predicatesTempIndex);
break;
}
instead
> of
BinaryRelationalOperatorNode beon =
(BinaryRelationalOperatorNode)
> and.getLeftOperand();
ValueNode left = beon.getLeftOperand();
ValueNode
> right = beon.getRightOperand();
/* If left or right side of predicate
> refer to
exists base table,
then remove it */
if ((left instanceof
> ColumnReference) &&
((ColumnReference) left).getTableNumber()
> ==
existsTableNumber)
> {
predicatesTemp.removeElementAt(predicatesTempIndex);
break;
}
else if
> ((right instanceof ColumnReference) &&
((ColumnReference)
> right).getTableNumber()
== existsTableNumber)
> {
predicatesTemp.removeElementAt(predicatesTempIndex);
break;
}
I have
> tried it out and it seems to work.
Jack Klebanoff
Mamta Satoor wrote:
> Hi,
I have a patch for this optimizer bug. Basically, the issue turned
> out
to be the logic for DISTINCT elimination. During the optimization
phase,
> if a query has DISTINCT clause, then impl.sql.compile.FromList
class's
> returnsAtMostSingleRow() method gets called. This method
returns true if the
> method concludes that DISTINCT in the query is
redundant (based on a complex
> logic that decides that the query is
going to return distinct rows on its
> own without the DISTINCT clause.
The details of the current logic for
> DISTINCT elimination can be found
in the comments at the method level.)
For
> the query in question in this bug, the method returned true for
DISTINCT
> elimination which is wrong. The explanation is as follows.
First of all, I
> was able to simplify the query reported in the bug to
following
> query.
select distinct q1."NO1" from IDEPT q1, IDEPT q2
where (
> q2."DISCRIM_DEPT" = 'HardwareDept')
and ( q1."DISCRIM_DEPT" =
> 'SoftwareDept') and ( q1."NO1" <> ALL
(select q3."NO1" from IDEPT q3 where
> (q3."REPORTTO_NO" = q2."NO1")))
This query gets converted to following
> during optimization
select distinct q1."NO1" from IDEPT q1, IDEPT q2
where (
> q2."DISCRIM_DEPT" = 'HardwareDept')
and ( q1."DISCRIM_DEPT" =
> 'SoftwareDept') and not exists (
(select q3."NO1" from IDEPT q3 where
(
> q3."REPORTTO_NO" = q2."NO1" and q3."NO1" = q1."NO1") ) ) ;
This optimized
> query has 4 predicates associated with it
q3.reportto_no =
> q2.no1
q2.discrim_dept = 'HardwareDept'
q1.descrim_dept =
> 'SoftwareDept'
q1.no1 = q3.no1
Next, on this optimized query(since it has
> DISTINCT clause in it), the
returnsAtMostSingleRow() method gets called. The
> method incorrectly
returns true indicating that DISTINCT can be eliminated.
> The reason
for this is that method is looking at predicates that belong to
> the
inside query with the exists clause (which is on table IDEPT q3)
> to
determine DISTINCT elimination for the outer level.
The fix is that the
> predicates from the exists query, (in this
particular case, q3."NO1" =
> q1."NO1" and q3.reportto_no = q2.no1)
should not be considered when deciding
> elimination of DISTINCT in the
outer query. That is what the attached patch
> does.
Hope this helps understand the problem and the proposed fix for
> it.
The files impacted by the change are as follows
svn stat
M
> java\engine\org\apache\derby\impl\sql\compile\FromList.java
M
> java\testing\org\apache\derbyTesting\functionTests\tests\lang\distinctElimination.sql
M
> java\testing\org\apache\derbyTesting\functionTests\master\distinctElimination.out
Please
> send in comments you may have. I have run the existing tests
and the patch
> didn't cause any failures.
thanks,
Mamta
> >
> ________________________________
>
Index:
> java/engine/org/apache/derby/impl/sql/compile/FromList.java
===================================================================
---
> java/engine/org/apache/derby/impl/sql/compile/FromList.java
> (revision 169642)
+++
> java/engine/org/apache/derby/impl/sql/compile/FromList.java
> (working copy)
@@ -1040,7 +1040,7 @@
}
/**
- * Decrement (query block)
> level (0-based) for
+ * Decrement (query block) level (0-based) for
* all
> of the tables in this from list.
* This is useful when flattening a
> subquery.
*
@@ -1069,7 +1069,7 @@
}
}
-
+
/**
* This method is used
> for both subquery flattening and distinct
* elimination based on a
> uniqueness condition. For subquery
@@ -1079,7 +1079,7 @@
* any duplicates.
> * This is true if every table in the from list is
* (a base table and the
> set of columns from the table that
- * are in equality comparisons with
> expressions that do not include columns
+ * are in equality comparisons
> with expressions that do not include columns
* from the same table is a
> superset of any unique index
* on the table) or an EXISTS FBT. In addition,
> at least 1 of the tables
* in the list has a set of columns in equality
> comparisons with expressions
@@ -1105,28 +1105,28 @@
* create an array of
> columns from the table(eqOuterCol)
* (this is used to determine that only
> one row will be returned
* from a join)
- *
+ *
* if the current table is
> the table for the result columns
* set the result columns in the eqOuterCol
> and tableColMap
* (if these columns are a superset of a unique index and
*
> all joining tables result in only one row, the
* results will be distinct)
> * go through all the predicates and update tableColMap and
- * eqOuterCol
> with join columns and correlation variables,
+ * eqOuterCol with join
> columns and correlation variables,
* parameters and constants
* since
> setting constants, correlation variables and parameters,
- * reduces the
> number of columns required for uniqueness in a
+ * reduces the number of
> columns required for uniqueness in a
* multi-column index, they are set for
> all the tables (if the
* table is not the result table, in this case only
> the column of the
* result table is set)
* join columns are just updated
> for the column in the row of the
* joining table.
- *
- * check if the
> marked columns in tableColMap are a superset of a unique
- * index
+ *
+ *
> check if the marked columns in tableColMap are a superset of a unique
+ *
> index
* (This means that the join will only produce 1 row when joined
*
> with 1 row of another table)
- * check that there is a least one table for
> which the columns in
+ * check that there is a least one table for which
> the columns in
* eqOuterCol(i.e. constant values) are a superset of a
> unique index
* (This quarantees that there will be only one row selected
*
> from this table).
@@ -1134,8 +1134,8 @@
* Once all tables have been
> evaluated, check that all the tables can be
* joined by unique index or
> will have only one row
*
- *
*
+ *
* @param rcl If non-null, the RCL
> from the query block.
* If non-null for subqueries, then entry can
* be
> considered as part of an = comparison.
@@ -1150,8 +1150,8 @@
*
*
> @exception StandardException Thrown on error
*/
- boolean
> returnsAtMostSingleRow(ResultColumnList rcl,
- ValueNode
> whereClause,
+ boolean
> returnsAtMostSingleRow(ResultColumnList rcl,
+ ValueNode
> whereClause,
PredicateList wherePredicates,
DataDictionary dd)
throws
> StandardException
@@ -1160,6 +1160,13 @@
int[] tableNumbers;
> ColumnReference additionalCR = null;
+ PredicateList predicatesTemp;
+
> predicatesTemp = (PredicateList) getNodeFactory().getNode(
+
> C_NodeTypes.PREDICATE_LIST, getContextManager());
+ int wherePredicatesSize
> = wherePredicates.size();
+ for (int index = 0; index < wherePredicatesSize;
> index++)
+
> predicatesTemp.addPredicate((Predicate)wherePredicates.elementAt(index));
+
> /* When considering subquery flattening, we are interested
* in the 1st
> (and only) entry in the RCL. (The RCL will be
* null if result column is
> not of interest for subquery flattening.)
@@ -1193,6 +1200,77 @@
{
return
> false;
}
+ FromBaseTable fbt = (FromBaseTable) prn.getChildResult();
+
> //Following for loop code is to take care of Derby-251 (DISTINCT returns
+
> //duplicate rows).
+ //Derby-251 returned duplicate rows because we were
> looking at predicates
+ //that belong to existsTable to determine DISTINCT
> elimination
+ //
+ //(Check method level comments to understand DISTINCT
> elimination rules.)
+ //
+ //For one specific example, consider the query
> below
+ //select distinct q1."NO1" from IDEPT q1, IDEPT q2
+ //where (
> q2."DISCRIM_DEPT" = 'HardwareDept')
+ //and ( q1."DISCRIM_DEPT" =
> 'SoftwareDept') and ( q1."NO1" <> ALL
+ //(select q3."NO1" from IDEPT q3
> where (q3."REPORTTO_NO" = q2."NO1")))
+ //(select q3."NO1" from IDEPT q3
> where ( ABS(q3."REPORTTO_NO") = q2."NO1")))
+ //
+ //Table IDEPT in the
> query above has a primary key defined on column "NO1"
+ //This query gets
> converted to following during optimization
+ //
+ //select distinct q1."NO1"
> from IDEPT q1, IDEPT q2
+ //where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+
> //and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and not exists (
+ //(select
> q3."NO1" from IDEPT q3 where
+ //( ( ABS(q3."REPORTTO_NO") = q2."NO1") and
> q3."NO1" = q1."NO1") ) ) ;
+ //
+ //For the optimized query above, Derby
> generates following predicates.
+ //ABS(q3.reportto_no) = q2.no1
+
> //q2.discrim_dept = 'HardwareDept'
+ //q1.descrim_dept = 'SoftwareDept'
+
> //q1.no1 = q3.no1
+ //The predicate ABS(q3."NO1") = q1."NO1" should not be
> considered when trying
+ //to determine if q1 in the outer query has
> equality comparisons.
+ //Similarly, the predicate q3.reportto_no = q2.no1
> should not be
+ //considered when trying to determine if q2 in the outer
> query has
+ //equality comparisons. To achieve this, predicates based on
> exists base
+ //table q3 (the first and the last predicate) should be
> removed while
+ //evaluating outer query for uniqueness.
+ //
+ if
> (fbt.getExistsBaseTable())
+ {
+ int existsTableNumber =
> fbt.getTableNumber();
+ int predicatesTempSize = predicatesTemp.size();
+
> for (int predicatesTempIndex = predicatesTempSize-1;
+ predicatesTempIndex
> >= 0; predicatesTempIndex--)
+ {
+ AndNode topAndNode = (AndNode)
+
> ((Predicate)
> predicatesTemp.elementAt(predicatesTempIndex)).getAndNode();
+
+
> for (ValueNode whereWalker = topAndNode; whereWalker instanceof AndNode;
+
> whereWalker = ((AndNode) whereWalker).getRightOperand())
+ {
+ // See if
> this is a candidate =
+ AndNode and = (AndNode) whereWalker;
+
+ //we only
> need to worry about equality predicates because only those
+ //predicates
> are considered during DISTINCT elimination.
+ if
> (!and.getLeftOperand().isRelationalOperator() ||
+
> !(((RelationalOperator)(and.getLeftOperand())).getOperator() ==
+
> RelationalOperator.EQUALS_RELOP))
+ {
+ continue;
+ }
+
+ JBitSet
> referencedTables = and.getLeftOperand().getTablesReferenced();
+ if
> (referencedTables.get(existsTableNumber))
+ {
+
> predicatesTemp.removeElementAt(predicatesTempIndex);
+ break;
+ }
+ }
+ }
+
> }
}
/* Build an array of tableNumbers from this query block.
@@ -1245,7
> +1323,7 @@
/* Now see if there are any equality conditions
* of interest
> in the where predicates.
*/
-
> wherePredicates.checkTopPredicatesForEqualsConditions(
+
> predicatesTemp.checkTopPredicatesForEqualsConditions(
> tableNumber, eqOuterCols, tableNumbers,
tableColMap[index],
> resultColTable);
@@ -1298,7 +1376,7 @@
/* unique key join - exists tables
> already marked as
* 1 row - so don't need to look at them
*/
- if
> (!oneRow[i] && tableColMap[i][index].get(0))
+ if (!oneRow[i] &&
> tableColMap[i][index].get(0))
{
oneRow[i] = true;
foundOneRow =
> true;
Index:
> java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
===================================================================
---
> java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
> (revision 169642)
+++
> java/testing/org/apache/derbyTesting/functionTests/tests/lang/distinctElimination.sql
> (working copy)
@@ -12,6 +12,11 @@
create unique index three_c1 on
> three(c1);
create table four(c1 int, c2 int, c3 int, c4 int, c5 int);
> create unique index four_c1c3 on four(c1, c3);
+CREATE TABLE "APP"."IDEPT"
> ("DISCRIM_DEPT" VARCHAR(32), "NO1" INTEGER NOT NULL,
+"NAME" VARCHAR(50),
> "AUDITOR_NO" INTEGER, "REPORTTO_NO" INTEGER, "HARDWAREASSET"
+ VARCHAR(15),
> "SOFTWAREASSET" VARCHAR(15));
+-- primary/unique
+ALTER TABLE "APP"."IDEPT"
> ADD CONSTRAINT "PK_IDEPT" PRIMARY KEY ("NO1");
insert into one values (1,
> 1, 1, 1, 1);
insert into one values (2, 1, 1, 1, 1);
@@ -51,6 +56,12 @@
> insert into four values (3, 1, 2, 1, 1);
insert into four values (3, 1, 3,
> 1, 1);
+insert into idept values ('Dept', 1, 'Department1', null, null,
> null, null);
+insert into idept values ('HardwareDept', 2, 'Department2',
> 25, 1, 'hardwareaset2', null);
+insert into idept values ('HardwareDept', 3,
> 'Department3', 25, 2, 'hardwareaset3', null);
+insert into idept values
> ('SoftwareDept', 4, 'Department4', 25, 1, null, 'softwareasset4');
+insert
> into idept values ('SoftwareDept', 5, 'Department5', 30, 4, null,
> 'softwareasset5');
+
call
> SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
> maximumdisplaywidth 20000;
@@ -61,6 +72,24 @@
-- Following runtime
> statistics output should have Distinct Scan in it
values
> SYSCS_UTIL.SYSCS_GET_RUNTIMESTATISTICS();
+--Derby251
> Distinct should not get eliminated for following query
+--because there is
> no equality condition on unique column of table
+--in the outside
> query
+select distinct q1."NO1", q1."NAME", q1."AUDITOR_NO",
>
+q1."REPORTTO_NO", q1."DISCRIM_DEPT", q1."SOFTWAREASSET" from
+IDEPT q1,
> IDEPT q2 where ( q2."DISCRIM_DEPT" = 'HardwareDept')
+ and (
> q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1"
+<> ALL ( select
> q3."NO1" from IDEPT q3 where ( (
+q3."DISCRIM_DEPT" = 'Dept') or (
> q3."DISCRIM_DEPT" =
+'HardwareDept') or ( q3."DISCRIM_DEPT" =
> 'SoftwareDept') )
+and ( q3."REPORTTO_NO" = q2."NO1") ) ) ;
+--
+--Another
> test case of Derby251 where the exists table column is embedded in an
> expression.
+select distinct q1."NO1" from IDEPT q1, IDEPT q2
+where (
> q2."DISCRIM_DEPT" = 'HardwareDept')
+and ( q1."DISCRIM_DEPT" =
> 'SoftwareDept') and ( q1."NO1" <> ALL
+(select q3."NO1" from IDEPT q3 where
> ( ABS(q3."REPORTTO_NO") = q2."NO1")));
+
-- result ordering is not
> guaranteed, but order by clause will change how
-- distinct is executed. So
> test by retrieving data into a temp table and
-- return results ordered
> after making sure the query was executed as expected.
@@ -170,3 +199,4 @@
> drop table two;
drop table three;
drop table four;
+drop table
> idept;
Index:
> java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
===================================================================
---
> java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
> (revision 169642)
+++
> java/testing/org/apache/derbyTesting/functionTests/master/distinctElimination.out
> (working copy)
@@ -19,6 +19,13 @@
0 rows inserted/updated/deleted
ij>
> create unique index four_c1c3 on four(c1, c3);
0 rows
> inserted/updated/deleted
+ij> CREATE TABLE "APP"."IDEPT" ("DISCRIM_DEPT"
> VARCHAR(32), "NO1" INTEGER NOT NULL,
+"NAME" VARCHAR(50), "AUDITOR_NO"
> INTEGER, "REPORTTO_NO" INTEGER, "HARDWAREASSET"
+ VARCHAR(15),
> "SOFTWAREASSET" VARCHAR(15));
+0 rows inserted/updated/deleted
+ij> --
> primary/unique
+ALTER TABLE "APP"."IDEPT" ADD CONSTRAINT "PK_IDEPT" PRIMARY
> KEY ("NO1");
+0 rows inserted/updated/deleted
ij> insert into one values
> (1, 1, 1, 1, 1);
1 row inserted/updated/deleted
ij> insert into one values
> (2, 1, 1, 1, 1);
@@ -87,6 +94,16 @@
1 row inserted/updated/deleted
ij>
> insert into four values (3, 1, 3, 1, 1);
1 row
> inserted/updated/deleted
+ij> insert into idept values ('Dept', 1,
> 'Department1', null, null, null, null);
+1 row inserted/updated/deleted
+ij>
> insert into idept values ('HardwareDept', 2, 'Department2', 25, 1,
> 'hardwareaset2', null);
+1 row inserted/updated/deleted
+ij> insert into
> idept values ('HardwareDept', 3, 'Department3', 25, 2, 'hardwareaset3',
> null);
+1 row inserted/updated/deleted
+ij> insert into idept values
> ('SoftwareDept', 4, 'Department4', 25, 1, null, 'softwareasset4');
+1 row
> inserted/updated/deleted
+ij> insert into idept values ('SoftwareDept', 5,
> 'Department5', 30, 4, null, 'softwareasset5');
+1 row
> inserted/updated/deleted
ij> call
> SYSCS_UTIL.SYSCS_SET_RUNTIMESTATISTICS(1);
0 rows
> inserted/updated/deleted
ij> maximumdisplaywidth 20000;
@@ -143,6 +160,31
> @@
None
next qualifiers:
None
+ij> --Derby251 Distinct should not get
> eliminated for following query
+--because there is no equality condition on
> unique column of table
+--in the outside query
+select distinct q1."NO1",
> q1."NAME", q1."AUDITOR_NO",
+q1."REPORTTO_NO", q1."DISCRIM_DEPT",
> q1."SOFTWAREASSET" from
+IDEPT q1, IDEPT q2 where ( q2."DISCRIM_DEPT" =
> 'HardwareDept')
+ and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1"
>
+<> ALL ( select q3."NO1" from IDEPT q3 where ( (
+q3."DISCRIM_DEPT" =
> 'Dept') or ( q3."DISCRIM_DEPT" =
+'HardwareDept') or ( q3."DISCRIM_DEPT" =
> 'SoftwareDept') )
+and ( q3."REPORTTO_NO" = q2."NO1") ) ) ;
+NO1 |NAME
> |AUDITOR_NO |REPORTTO_NO|DISCRIM_DEPT |SOFTWAREASSET
>
+---------------------------------------------------------------------------------------------------------------------------------------
+4
> |Department4 |25 |1 |SoftwareDept |softwareasset4
+5 |Department5 |30 |4
> |SoftwareDept |softwareasset5
+ij> --
+--Another test case of Derby251
> where the exists table column is embedded in an expression.
+select distinct
> q1."NO1" from IDEPT q1, IDEPT q2
+where ( q2."DISCRIM_DEPT" =
> 'HardwareDept')
+and ( q1."DISCRIM_DEPT" = 'SoftwareDept') and ( q1."NO1" <>
> ALL
+(select q3."NO1" from IDEPT q3 where ( ABS(q3."REPORTTO_NO") =
> q2."NO1")));
+NO1
+-----------
+4
+5
ij> -- result ordering is not
> guaranteed, but order by clause will change how
-- distinct is executed. So
> test by retrieving data into a temp table and
-- return results ordered
> after making sure the query was executed as expected.
@@ -2319,4 +2361,6 @@
> 0 rows inserted/updated/deleted
ij> drop table four;
0 rows
> inserted/updated/deleted
+ij> drop table idept;
+0 rows
> inserted/updated/deleted
ij>
>