[
https://issues.apache.org/jira/browse/DERBY-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Knut Anders Hatlen updated DERBY-6227:
--------------------------------------
Attachment: d6227-1a.diff
I think the fix for this bug is to make the list of distinct values hold
DataValueDescriptors instead of java.lang.String objects, since the
DataValueDescriptor classes implement equals() in a way that takes the database
collation into consideration.
The attached patch [^d6227-1a.diff] changes GroupedAggregateResultSet so that
it does that. All regression tests ran cleanly with that patch, including a new
test case for this bug.
I wasn't quite sure if it was safe to store the DVDs directly in the list of
distinct values, or if they would need to be cloned first (because some result
sets reuse old DVDs when reading new rows, and we don't want the values in the
list to change). I concluded that it was safe to store them directly, because
distinct aggregates always take their rows from a sorter and not directly from
the result set, and the sorter returns clones of the original DVDs. I added an
assert that verifies that the row is read from the sorter, and a comment that
says we should check ResultSet.needsToClone() if we ever start reading directly
from the result set when processing distinct aggregates.
> Distinct aggregates don't work well with territory-based collation
> ------------------------------------------------------------------
>
> Key: DERBY-6227
> URL: https://issues.apache.org/jira/browse/DERBY-6227
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.6.1.0, 10.6.2.1, 10.7.1.1, 10.8.1.2, 10.8.2.2,
> 10.8.3.0, 10.9.1.0, 10.10.1.1
> Reporter: Knut Anders Hatlen
> Assignee: Knut Anders Hatlen
> Labels: derby_triage10_11
> Attachments: d6227-1a.diff
>
>
> When working on DERBY-5840, I noticed that GroupedAggregateResultSet would
> do duplicate elimination by comparing the java.lang.String representation of
> the values. With territory-based collation, it is possible that two values
> that have different java.lang.String representation should be considered
> duplicates, and this logic will produce incorrect results.
> Example:
> ij version 10.10
> ij> connect
> 'jdbc:derby:memory:db;territory=en_US;collation=TERRITORY_BASED:PRIMARY;create=true';
> ij> create table t(i int, s varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values (1, 'a'), (1, 'a'), (2, 'b'), (2, 'B'), (3, 'a'),
> (3, 'A'), (3, 'b'), (3, 'B'), (3, 'c');
> 9 rows inserted/updated/deleted
> ij> select distinct s from t;
> S
> ----------
> b
> a
> c
> 3 rows selected
> ij> select i, count(distinct s) from t group by i;
> I |2
> -----------------------
> 1 |1
> 2 |2
> 3 |5
> 3 rows selected
> I would have expected the last query to return
> (1, 1)
> (2, 1)
> (3, 3)
--
This message was sent by Atlassian JIRA
(v6.2#6252)