[jira] Commented: (DERBY-504) SELECT DISTINCT returns duplicates when selecting from subselects

Knut Anders Hatlen (JIRA) Mon, 05 Sep 2005 02:27:35 -0700

    [ 
http://issues.apache.org/jira/browse/DERBY-504?page=comments#action_12322650 ]


Knut Anders Hatlen commented on DERBY-504:
------------------------------------------

It is true that the change in SelectNode is doing redundant work, but not 
because (resultColumns.countNumberOfSimpleColumnReferences() == 
resultColumns.size()) guarantees that all result columns are simple columns (if 
we by "simple columns" mean a column in a base table, no aggregates etc). E.g. 
in the query 'SELECT a FROM (SELECT AVG(age) AS a FROM names) AS n', 
resultColumns.countNumberOfSimpleColumnReferences() equals 
resultColumns.size(), but the result column is not simple. The redundancy is 
the other way around: If (but not only if) all colums are simple, then 
(countNumberOfSimpleColumnReferences() == size()) is true.

I can submit a patch which removes this redundant checking. It doesn't seem 
like ResultColumnList.countNumberOfSimpleColumnReferences() is used anywhere 
else in the code. If I remove the call to countNumberOfSimpleColumnReferences() 
and it is not used anywhere else, should I then also remove the definition of 
the method to make the code cleaner, or should I leave the method in case it 
would be needed in the future?

> SELECT DISTINCT returns duplicates when selecting from subselects
> -----------------------------------------------------------------
>
>          Key: DERBY-504
>          URL: http://issues.apache.org/jira/browse/DERBY-504
>      Project: Derby
>         Type: Bug
>   Components: SQL
>     Versions: 10.2.0.0
>  Environment: Latest development sources (SVN revision 232227), Sun JDK 1.5, 
> Solaris/x86
>     Reporter: Knut Anders Hatlen
>     Assignee: Knut Anders Hatlen
>     Priority: Minor
>  Attachments: DERBY-504.diff, DERBY-504.stat, DERBY-504_b.diff, 
> DERBY-504_b.stat, DERBY-504_c-CRLF.diff, DERBY-504_c-CRLF.diff, 
> DERBY-504_c.diff, DERBY-504_c.stat
>
> When one performs a select distinct on a table generated by a subselect, 
> there sometimes are duplicates in the result. The following example shows the 
> problem:
> ij> CREATE TABLE names (id INT PRIMARY KEY, name VARCHAR(10));
> 0 rows inserted/updated/deleted
> ij> INSERT INTO names (id, name) VALUES
>        (1, 'Anna'), (2, 'Ben'), (3, 'Carl'),
>        (4, 'Carl'), (5, 'Ben'), (6, 'Anna');
> 6 rows inserted/updated/deleted
> ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n;
> NAME      
> ----------
> Anna      
> Ben       
> Carl      
> Carl      
> Ben       
> Anna      
> Six names are returned, although only three names should have been returned.
> When the result is explicitly sorted (using ORDER BY) or the id column is 
> removed from the subselect, the query returns three names as expected:
> ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n ORDER BY 
> name;
> NAME      
> ----------
> Anna      
> Ben       
> Carl      
> 3 rows selected
> ij> SELECT DISTINCT(name) FROM (SELECT name FROM names) AS n;
> NAME      
> ----------
> Anna      
> Ben       
> Carl      
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-504) SELECT DISTINCT returns duplicates when selecting from subselects

Reply via email to