[jira] [Updated] (DERBY-6227) Distinct aggregates don't work well with territory-based collation

Knut Anders Hatlen (JIRA) Wed, 18 Jun 2014 06:43:46 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Knut Anders Hatlen updated DERBY-6227:
--------------------------------------

    Attachment: d6227-1a.diff

I think the fix for this bug is to make the list of distinct values hold 
DataValueDescriptors instead of java.lang.String objects, since the 
DataValueDescriptor classes implement equals() in a way that takes the database 
collation into consideration.

The attached patch [^d6227-1a.diff] changes GroupedAggregateResultSet so that 
it does that. All regression tests ran cleanly with that patch, including a new 
test case for this bug.

I wasn't quite sure if it was safe to store the DVDs directly in the list of 
distinct values, or if they would need to be cloned first (because some result 
sets reuse old DVDs when reading new rows, and we don't want the values in the 
list to change). I concluded that it was safe to store them directly, because 
distinct aggregates always take their rows from a sorter and not directly from 
the result set, and the sorter returns clones of the original DVDs. I added an 
assert that verifies that the row is read from the sorter, and a comment that 
says we should check ResultSet.needsToClone() if we ever start reading directly 
from the result set when processing distinct aggregates.

> Distinct aggregates don't work well with territory-based collation
> ------------------------------------------------------------------
>
>                 Key: DERBY-6227
>                 URL: https://issues.apache.org/jira/browse/DERBY-6227
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0, 10.6.2.1, 10.7.1.1, 10.8.1.2, 10.8.2.2, 
> 10.8.3.0, 10.9.1.0, 10.10.1.1
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>              Labels: derby_triage10_11
>         Attachments: d6227-1a.diff
>
>
>  When working on DERBY-5840, I noticed that GroupedAggregateResultSet would 
> do duplicate elimination by comparing the java.lang.String representation of 
> the values. With territory-based collation, it is possible that two values 
> that have different java.lang.String representation should be considered 
> duplicates, and this logic will produce incorrect results.
> Example:
> ij version 10.10
> ij> connect 
> 'jdbc:derby:memory:db;territory=en_US;collation=TERRITORY_BASED:PRIMARY;create=true';
> ij> create table t(i int, s varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values (1, 'a'), (1, 'a'), (2, 'b'), (2, 'B'), (3, 'a'), 
> (3, 'A'), (3, 'b'), (3, 'B'), (3, 'c');
> 9 rows inserted/updated/deleted
> ij> select distinct s from t;
> S         
> ----------
> b         
> a         
> c         
> 3 rows selected
> ij> select i, count(distinct s) from t group by i;
> I          |2          
> -----------------------
> 1          |1          
> 2          |2          
> 3          |5          
> 3 rows selected
> I would have expected the last query to return
> (1, 1)
> (2, 1)
> (3, 3)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (DERBY-6227) Distinct aggregates don't work well with territory-based collation

Reply via email to