SELECT DISTINCT returns duplicates when selecting from subselects
-----------------------------------------------------------------
Key: DERBY-504
URL: http://issues.apache.org/jira/browse/DERBY-504
Project: Derby
Type: Bug
Components: SQL
Versions: 10.2.0.0
Environment: Latest development sources (SVN revision 232227), Sun JDK 1.5, Solaris/x86
Reporter: Knut Anders Hatlen
Assignee: Knut Anders Hatlen
Priority: Minor
Attachments: DERBY-504.diff, DERBY-504.stat, DERBY-504_b.diff, DERBY-504_b.stat, DERBY-504_c-CRLF.diff, DERBY-504_c-CRLF.diff, DERBY-504_c.diff, DERBY-504_c.stat
When one performs a select distinct on a table generated by a subselect, there sometimes are duplicates in the result. The following example shows the problem:
ij> CREATE TABLE names (id INT PRIMARY KEY, name VARCHAR(10));
0 rows inserted/updated/deleted
ij> INSERT INTO names (id, name) VALUES
(1, 'Anna'), (2, 'Ben'), (3, 'Carl'),
(4, 'Carl'), (5, 'Ben'), (6, 'Anna');
6 rows inserted/updated/deleted
ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n;
NAME
----------
Anna
Ben
Carl
Carl
Ben
Anna
Six names are returned, although only three names should have been returned.
When the result is explicitly sorted (using ORDER BY) or the id column is removed from the subselect, the query returns three names as expected:
ij> SELECT DISTINCT(name) FROM (SELECT name, id FROM names) AS n ORDER BY name;
NAME
----------
Anna
Ben
Carl
3 rows selected
ij> SELECT DISTINCT(name) FROM (SELECT name FROM names) AS n;
NAME
----------
Anna
Ben
Carl
3 rows selected