I have been playing with this a bit, trying to find an instace where two values are equal with TERRITORY_BASED collation, but not with UCS_BASIC. What I found was that it seems that for the default the Collator strength is set to IDENTICAL by default, so interesting cases like "v" = "w" in Swedish or accented letters in other languages don't show up. To get these matches in a java program I have to set the Collator string to PRIMARY as below: Can anyone think of an example where two strings will be equal with TERRITORY_BASED collation but not with UCS_BASIC?


In my java program I have to setStrength to PRIMARY to get "v" and "w" to match.

import java.text.Collator;
import java.util.Locale;

public class TestSwedish {

   public static void main(String[] args) {
   // Compare two strings in the default locale
   Collator myCollator = Collator.getInstance(new Locale("sv","SE"));
   myCollator.setStrength(Collator.PRIMARY);
   if( myCollator.compare("v", "w") == 0 )
       System.out.println("v = w");
   else
       System.out.println("v != w");
}
}

[C:/kmarsden/repro/2967] java TestSwedish
v = w

In Derby they don't match:
ij> connect 'jdbc:derby:sweddb;territory=sv_SE;collation=TERRITORY_BASED;create=true';
ij(CONNECTION1)> create table t (vc varchar(30));
0 rows inserted/updated/deleted
ij(CONNECTION1)> insert into t values('v');
1 row inserted/updated/deleted
ij(CONNECTION1)> insert into t values('w');
1 row inserted/updated/deleted
ij(CONNECTION1)> select vc from t where vc = 'v';
VC
------------------------------
v

1 row selected
ij(CONNECTION1)> select vc from t where vc like 'v';
VC
------------------------------
v

1 row selected
ij(CONNECTION1)>


P.S. it seems like it would be a useful enhancement to be able to set the collatorStrength. That way we could have case-insensitive searches etc.
http://java.sun.com/javase/6/docs/api/java/text/Collator.html


Reply via email to