[jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Knut Anders Hatlen (JIRA) Wed, 17 Dec 2008 05:08:10 -0800

    [ 
https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657373#action_12657373
 ]


Knut Anders Hatlen commented on DERBY-3975:
-------------------------------------------

One thing I forgot to mention in the description of the patch:

The patch changes the hashCode() implementation in 
CollatorSQL{Char,Varchar,Longvarchar,Clob}, but it only tests CHAR and VARCHAR. 
That's because LONG VARCHAR and CLOB columns cannot be used in distinct queries 
without casting them to another data type first, and since you cannot compare 
two such columns with =, you cannot perform a hash join to test it either. I'm 
not aware of any other code than distinct scans and hash scans that will ever 
call the hashCode() methods of these objects, so I don't know of any way to 
test those two data types.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, 
> derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. 
> When I tried queries with SELECT DISTINCT, they didn't always eliminate all 
> duplicates. Here's an example:
> ij> connect 
> 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not 
> eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the 
> second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Reply via email to