Ok, now I understand. With that change, the situation is much better. It doesn't fully satisfy RL1.1, because you can't use hex codepoint numbers -- you have to use the fairly ugly workaround of
String hexPattern = codePoint <= 0xFFFF ? String.format("\\u%04x", codePoint) : String.format("\\u%04x\\u%04x", (int) Character.toChars(codePoint)[0], ( int) Character.toChars(codePoint)[1]); BTW, in plain Java I really miss a few of the ICU4J routines, like: - char c1 = UTF16.getLeadSurrogate(codePoint); - char c2 = UTF16.getLeadSurrogate(codePoint); - String s = UTF16.valueOf(codePoint); You can do them in plain Java, as in the above expression, but they're awkward and not as clear to read. And instead of the third one, the best I see in plain Java is the following, which is really pretty ugly (is there any better way?). String s = new StringBuilder().appendCodePoint(codePoint).toString(); Mark *— Il meglio è l’inimico del bene —* On Wed, Jan 26, 2011 at 12:47, Xueming Shen <xueming.s...@oracle.com> wrote: > Oh, I see the problem. Obviously I have been working on jdk7 too long and > forgot the > latest release is still 6:-( There is indeed a bug in the previous > implementation which I > fixed in 7 long time ago (I mentioned this in one of the early emails but > was not specific, > my apology), probably should backport to 6 update release asap. The test > case runs well > (the "failures" in literals are expected) on 7 with the following output. I > modified your test > case "slightly" since it appears the UnicodeSet class in our normalizer > package does not > have the size(), replace it with a normal hashset. >