[ http://issues.apache.org/jira/browse/DERBY-504?page=comments#action_12322650 ]
Knut Anders Hatlen commented on DERBY-504: ------------------------------------------ It is true that the change in SelectNode is doing redundant work, but not because (resultColumns.countNumberOfSimpleColumnReferences() == resultColumns.size()) guarantees that all result columns are simple columns (if we by "simple columns" mean a column in a base table, no aggregates etc). E.g. in the query 'SELECT a FROM (SELECT AVG(age) AS a FROM names) AS n', resultColumns.countNumberOfSimpleColumnReferences() equals resultColumns.size(), but the result column is not simple. The redundancy is the other way around: If (but not only if) all colums are simple, then (countNumberOfSimpleColumnReferences() == size()) is true. I can submit a patch which removes this redundant checking. It doesn't seem like ResultColumnList.countNumberOfSimpleColumnReferences() is used anywhere else in the code. If I remove the call to countNumberOfSimpleColumnReferences() and it is not used anywhere else, should I then also remove the definition of the method to make the code cleaner, or should I leave the method in case it would be needed in the future? > SELECT DISTINCT returns duplicates when selecting from subselects > ----------------------------------------------------------------- > > Key: DERBY-504 > URL: http://issues.apache.org/jira/browse/DERBY-504 > Project: Derby > Type: Bug > Components: SQL > Versions: 10.2.0.0 > Environment: Latest development sources (SVN revision 232227), Sun JDK 1.5, > Solaris/x86 > Reporter: Knut Anders Hatlen > Assignee: Knut Anders Hatlen > Priority: Minor > Attachments: DERBY-504.diff, DERBY-504.stat, DERBY-504_b.diff, > DERBY-504_b.stat, DERBY-504_c-CRLF.diff, DERBY-504_c-CRLF.diff, > DERBY-504_c.diff, DERBY-504_c.stat > > When one performs a select distinct on a table generated by a subselect, > there sometimes are duplicates in the result. The following example shows the > problem: > ij> CREATE TABLE names (id INT PRIMARY KEY, name VARCHAR(10)); > 0 rows inserted/updated/deleted > ij> INSERT INTO names (id, name) VALUES > (1, 'Anna'), (2, 'Ben'), (3, 'Carl'), > (4, 'Carl'), (5, 'Ben'), (6, 'Anna'); > 6 rows inserted/updated/deleted > ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n; > NAME > ---------- > Anna > Ben > Carl > Carl > Ben > Anna > Six names are returned, although only three names should have been returned. > When the result is explicitly sorted (using ORDER BY) or the id column is > removed from the subselect, the query returns three names as expected: > ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n ORDER BY > name; > NAME > ---------- > Anna > Ben > Carl > 3 rows selected > ij> SELECT DISTINCT(name) FROM (SELECT name FROM names) AS n; > NAME > ---------- > Anna > Ben > Carl > 3 rows selected -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
