[
https://issues.apache.org/jira/browse/DERBY-6607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027692#comment-14027692
]
Rick Hillegas commented on DERBY-6607:
--------------------------------------
Hi Brett,
I agree with Knut that you will need a custom collator for this problem. It
sounds like your out-of-the-box collator isn't subtle enough for Japanese. A
collator is supposed to impose a linear order, and you are up against the
anti-symmetric law of linear orders (http://en.wikipedia.org/wiki/Total_order).
You have a situation where your collator is asserting that "word1 <= word2" and
"word2 <= word1". One of those assertions has to be false in order for you to
get the behavior you want.
Hope this helps,
-Rick
> Derby is using territory/collation for equality, not just ordering
> (incorrectly?)
> ---------------------------------------------------------------------------------
>
> Key: DERBY-6607
> URL: https://issues.apache.org/jira/browse/DERBY-6607
> Project: Derby
> Issue Type: Bug
> Components: Localization
> Affects Versions: 10.10.2.0
> Reporter: Brett Wooldridge
>
> We have a database where we wish case-insensitivity, and therefore it was
> created with collation=TERRITORY_BASED:PRIMARY. We have customers in both
> the United States (en_US) and in Japan (ja_JP).
> We have an issue in Japan. Japanese has three character sets: hiragana,
> katakana, and kanji. Hiragana is a phonetic alphabet with 46 letters.
> Katakana is an identical phonetic alphabet with 46 letters, written using
> different character forms, and used for foreign words (words adopted from
> other languages into Japanese).
> Here is the word 'cake' written in katakana: ケーキ (ke- ki)
> Here is the word 'cake' written in hiragana: けーき (ke- ki)
> In terms of collation (ordering), Japanese consider these to be equal. So,
> in the following Java code, the call to 'compare()' would return 0:
> {code:java}
> Collator collator = Collator.getInstance(Locale.JAPAN);
> collator.setStrength(Collator.PRIMARY);
> return collator.compare("ケーキ", "けーき");
> {code}
> And therein lies the issue. With respect to _ordering_ they are indeed
> equivalent, however Japanese would consider them district (non-equivalent)
> values.
> When a table is declared with a UNIQUE constraint on a column, or a PRIMARY
> KEY column, if 'ケーキ' exists in the table, Derby will throw a unique
> constraint violation upon an attempt to insert 'けーき'.
> We need collation=TERRITORY_BASED:PRIMARY or TERRITORY_BASED:SECONDARY for
> case-insensitivity _and_ at the same time need these values to be treated as
> unique. It is as if {{String.equals()}} should be used if the _lvalue_ or
> _rvalue_ of an = operator is Japanese, but should use {{Collator.equals()}}
> if both the _lvalue_ and _rvalue_ are "ascii-betical". The same for
> constraint checking.
> Is it "correct" that Derby use the collation when determining value
> equivalency vs. ordering equivalency?
> At the same time, I understand that this is tricky. Japanese has no
> "upper-case" and "lower-case" for hiragana, katakana, or kanji, however they
> do use "romanji" (roman characters) which are essentially ASCII, which is
> case-sensitive. Collation is merely used for ordering. So when
> TERRITORY_BASED:PRIMARY/SECONDARY is used, for Japanese, 'cat' and 'CAT'
> would be equivalent but 'ケーキ' and 'けーき' _would not be_. Unfortunately, there
> is only one Collator and it will identify _both_ of these as equivalent.
> Taking the example further, imagine a database with
> collation=TERRITORY_BASED:SECONDARY, and _tags_ table without a unique
> constraint, but containing the following values:
> {code:java}
> Tag
> -----------------------
> Cat
> cat
> ケーキ
> けーき
> {code}
> The following SQL should delete both cats:
> {code:sql}
> DELETE FROM tags WHERE tag='cAT'
> {code}
> But from the Japanese perspective, the following code would _erroneously_
> delete both cakes:
> {code:sql}
> DELETE FROM tags WHERE tag='ケーキ'
> {code}
> They consider the two expressions of the word cake distinct, but consider the
> two cats as equivalent. The Collator considers them all equivalent. It is
> as if {{String.equals()}} should be used if the _lvalue_ _or_ _rvalue_ of an
> = operator is Japanese, and use {{Collator.equals()}} if the _lvalue_ _and_
> _rvalue_ are "ascii-betical".
--
This message was sent by Atlassian JIRA
(v6.2#6252)