I have been playing with this a bit, trying to find an instace where two
values are equal with TERRITORY_BASED collation, but not with
UCS_BASIC. What I found was that it seems that for the default the
Collator strength is set to IDENTICAL by default, so interesting cases
like "v" = "w" in Swedish or accented letters in other languages don't
show up. To get these matches in a java program I have to set the
Collator string to PRIMARY as below: Can anyone think of an example
where two strings will be equal with TERRITORY_BASED collation but not
with UCS_BASIC?
In my java program I have to setStrength to PRIMARY to get "v" and "w"
to match.
import java.text.Collator;
import java.util.Locale;
public class TestSwedish {
public static void main(String[] args) {
// Compare two strings in the default locale
Collator myCollator = Collator.getInstance(new Locale("sv","SE"));
myCollator.setStrength(Collator.PRIMARY);
if( myCollator.compare("v", "w") == 0 )
System.out.println("v = w");
else
System.out.println("v != w");
}
}
[C:/kmarsden/repro/2967] java TestSwedish
v = w
In Derby they don't match:
ij> connect
'jdbc:derby:sweddb;territory=sv_SE;collation=TERRITORY_BASED;create=true';
ij(CONNECTION1)> create table t (vc varchar(30));
0 rows inserted/updated/deleted
ij(CONNECTION1)> insert into t values('v');
1 row inserted/updated/deleted
ij(CONNECTION1)> insert into t values('w');
1 row inserted/updated/deleted
ij(CONNECTION1)> select vc from t where vc = 'v';
VC
------------------------------
v
1 row selected
ij(CONNECTION1)> select vc from t where vc like 'v';
VC
------------------------------
v
1 row selected
ij(CONNECTION1)>
P.S. it seems like it would be a useful enhancement to be able to set
the collatorStrength. That way we could have case-insensitive searches etc.
http://java.sun.com/javase/6/docs/api/java/text/Collator.html