[jira] Updated: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Knut Anders Hatlen (JIRA) Tue, 16 Dec 2008 14:55:10 -0800

     [ 
https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Knut Anders Hatlen updated DERBY-3975:
--------------------------------------

    Attachment: derby-3975-1a.stat
                derby-3975-1a.diff

The attached patch attempts to fix the problem by implementing a hashCode() 
method in CollatorSQLChar, CollatorSQLVarchar, CollatorSQLLongvarchar and 
CollatorSQLClob based on CollationKey.hashCode(). It also extends 
CollationTest.compareAgrave() with a test case for SELECT DISTINCT, and makes 
it test both CHAR and VARCHAR (previously it only tested VARCHAR). 
CollationTest fails without the fix and passes with the fix. The test is based 
on the fact that in the French locale, À (Unicode code point 00C0) is the same 
as À ('A' + Unicode code point 0300), whereas they are different in UCS_BASIC.

I will start the regression tests now.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, 
> derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. 
> When I tried queries with SELECT DISTINCT, they didn't always eliminate all 
> duplicates. Here's an example:
> ij> connect 
> 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not 
> eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the 
> second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Reply via email to