A little further study. The collation is defined in CLDR. Please refer to the data in locale "es" [1]. There is a block describing the traditional collation. I quote a part of it below[2]. Let me try to explain a little bit about this definition.
First, the term "traditional" is explicitly defined. You can also find the definition in UTS#35[3] which says "For a traditional-style sort (as in Spanish) ". Second, the data[2] indicates that the rule in traditional spanish locale should be ... C<ch<<<Ch<<<CH. the tag <p> is "primary", which is to say the "ch" is a base-character. The conclusion is there IS a tradition Spanish collation rule which has a key "ch". The question is "Is it necessary for Harmony to support it or just to be the same behavoir as RI?" [1] http://www.unicode.org/repository/*checkout*/cldr/common/collation/es.xml?rev=1.21 [2] <collation type="traditional"> - <rules> ... <reset>C</reset> <p>ch</p> <t>Ch</t> <t>CH</t> ... </rules> </collation> [3] http://www.unicode.org/reports/tr35/ On 2/20/08, Alexei Zakharov <[EMAIL PROTECTED]> wrote: > ¡Buenos dìas! > > :) No, I'm not an expert in Spanish. But after reading your post I got > an impression that we have support for additional variant of Spanish > language comparing to RI. However, I've tried to find something about > traditional Spanish variant in ICU locale browser and found nothing. I > believe we should learn more about this problem before making any > decision. > > Regards, > Alexei > > 2008/2/19, Tony Wu <[EMAIL PROTECTED]>: > > Hi, all > > > > I'm investigating the regression[1] in text module. Actually these 5 > > failures come down to one reason: the support of traditional Spanish > > charactor "ch". Following is my understanding. > > > > My fix for HARMONY-5465 makes the Locale.toString be compatible with > > RI. Before my commit, the toString() of the Locale with empty "contry" > > field has only one underscore in the output but RI has two. For > > instance, new Locale("es","","TRADITIONAL").toString() returns > > "es_TRADITIONAL" in Harmony whereas "es__TRADITIONAL" in RI. Something > > interesting, ICU makes use of the output of toString() as keyword to > > indicate its Locale instance. That is to say, the 5 testcases passes > > before because they have not been tested in real traditional Spanish > > locale so that the character "ch" was interpreted as two separate > > characters "c" and "h". That is why we can set the offset to 1 in our > > testcases. After my commit, ICU find the right Spanish locale so that > > its behavior is compatible with spec[2]. > > > > One thing strange is that I can not get the traditional Spanish locale > > in RI. RI behaves the same no mater whether there is a variant > > "TRADITIONAL" or not. Spec does not say anything about the > > "traditional", but I googled to know that from 1998 the character "ch" > > has been cancelled in Spanish. I suppose that RI changed the behavior > > of Spanish locale but forgot to modify the spec accordingly. > > > > BTW for the normal Spanish Locale(new Locale("es","ES")), we have the > > same behavior with RI. Seems ICU supports the traditional Spanish in > > the form of new Locale("es","","TRADITIONAL") but RI does not. Run > > testcase below[3] on RI to show the differences. > > > > Is there any expert familiar with Spanish here? Neey your advice. > > > > [1] > > http://people.apache.org/~smishura/r628209/Windows_x86/classlib-test/ > > > > [2] > > spec says, > > For example, consider the following in Spanish: > > > > "ca" -> the first key is key('c') and second key is key('a'). > > "cha" -> the first key is key('ch') and second key is key('a'). > > > > > > [3] > > RuleBasedCollator rbColl = (RuleBasedCollator) Collator > > .getInstance(new Locale("es", "", "TRADITIONAL")); > > String text = "cha"; > > CollationElementIterator iterator = rbColl > > .getCollationElementIterator(text); > > int keyNum = 0; > > while (iterator.next() != -1) { > > keyNum++; > > } > > System.out.println("RI has " + keyNum + " keys"); > > > > com.ibm.icu.text.RuleBasedCollator r = > > (com.ibm.icu.text.RuleBasedCollator) com.ibm.icu.text.Collator > > .getInstance(new Locale("es", "", "TRADITIONAL")); > > com.ibm.icu.text.CollationElementIterator it = r > > .getCollationElementIterator(text); > > keyNum = 0; > > while (it.next() != -1) { > > keyNum++; > > } > > System.out.println("ICU has " + keyNum + " keys"); > > > > > > > > The output is: > > RI has 3 keys > > ICU has 2 keys > > > > > > -- > > Tony Wu > > China Software Development Lab, IBM > > > -- Tony Wu China Software Development Lab, IBM
