[jira] [Commented] (DERBY-6607) Derby is using territory/collation for equality, not just ordering (incorrectly?)

Rick Hillegas (JIRA) Wed, 11 Jun 2014 05:27:38 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027692#comment-14027692
 ]


Rick Hillegas commented on DERBY-6607:
--------------------------------------

Hi Brett,

I agree with Knut that you will need a custom collator for this problem. It 
sounds like your out-of-the-box collator isn't subtle enough for Japanese. A 
collator is supposed to impose a linear order, and you are up against the 
anti-symmetric law of linear orders (http://en.wikipedia.org/wiki/Total_order). 
You have a situation where your collator is asserting that "word1 <= word2" and 
"word2 <= word1". One of those assertions has to be false in order for you to 
get the behavior you want.

Hope this helps,
-Rick

> Derby is using territory/collation for equality, not just ordering 
> (incorrectly?)
> ---------------------------------------------------------------------------------
>
>                 Key: DERBY-6607
>                 URL: https://issues.apache.org/jira/browse/DERBY-6607
>             Project: Derby
>          Issue Type: Bug
>          Components: Localization
>    Affects Versions: 10.10.2.0
>            Reporter: Brett Wooldridge
>
> We have a database where we wish case-insensitivity, and therefore it was 
> created with collation=TERRITORY_BASED:PRIMARY.  We have customers in both 
> the United States (en_US) and in Japan (ja_JP).
> We have an issue in Japan.  Japanese has three character sets: hiragana, 
> katakana, and kanji.  Hiragana is a phonetic alphabet with 46 letters.  
> Katakana is an identical phonetic alphabet with 46 letters, written using 
> different character forms, and used for foreign words (words adopted from 
> other languages into Japanese).
> Here is the word 'cake' written in katakana: ケーキ (ke- ki)
> Here is the word 'cake' written in hiragana: けーき  (ke- ki)
> In terms of collation (ordering), Japanese consider these to be equal.  So, 
> in the following Java code, the call to 'compare()' would return 0:
> {code:java}
> Collator collator = Collator.getInstance(Locale.JAPAN);
> collator.setStrength(Collator.PRIMARY);
> return collator.compare("ケーキ", "けーき");
> {code}
> And therein lies the issue.  With respect to _ordering_ they are indeed 
> equivalent, however Japanese would consider them district  (non-equivalent) 
> values.
> When a table is declared with a UNIQUE constraint on a column, or a PRIMARY 
> KEY column, if 'ケーキ' exists in the table, Derby will throw a unique 
> constraint violation upon an attempt to insert 'けーき'.
> We need collation=TERRITORY_BASED:PRIMARY or TERRITORY_BASED:SECONDARY for 
> case-insensitivity _and_ at the same time need these values to be treated as 
> unique.  It is as if {{String.equals()}} should be used if the _lvalue_ or 
> _rvalue_ of an = operator is Japanese, but should use {{Collator.equals()}} 
> if both the _lvalue_ and _rvalue_ are "ascii-betical".  The same for 
> constraint checking.
> Is it "correct" that Derby use the collation when determining value 
> equivalency vs. ordering equivalency?
> At the same time, I understand that this is tricky.  Japanese has no 
> "upper-case" and "lower-case" for hiragana, katakana, or kanji, however they 
> do use "romanji" (roman characters) which are essentially ASCII, which is 
> case-sensitive.  Collation is merely used for ordering.  So when  
> TERRITORY_BASED:PRIMARY/SECONDARY is used, for Japanese, 'cat' and 'CAT' 
> would be equivalent but 'ケーキ' and 'けーき' _would not be_.  Unfortunately, there 
> is only one Collator and it will identify _both_ of these as equivalent.
> Taking the example further, imagine a database with 
> collation=TERRITORY_BASED:SECONDARY, and _tags_ table without a unique 
> constraint, but containing the following values:
> {code:java}
> Tag
> -----------------------
> Cat
> cat
> ケーキ
> けーき
> {code}
> The following SQL should delete both cats:
> {code:sql}
> DELETE FROM tags WHERE tag='cAT'
> {code}
> But from the Japanese perspective, the following code would _erroneously_ 
> delete both cakes:
> {code:sql}
> DELETE FROM tags WHERE tag='ケーキ'
> {code}
> They consider the two expressions of the word cake distinct, but consider the 
> two cats as equivalent.  The Collator considers them all equivalent.  It is 
> as if {{String.equals()}} should be used if the _lvalue_ _or_ _rvalue_ of an 
> = operator is Japanese, and use {{Collator.equals()}} if the _lvalue_ _and_ 
> _rvalue_ are "ascii-betical".



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (DERBY-6607) Derby is using territory/collation for equality, not just ordering (incorrectly?)

Reply via email to