Re: RL1.1 Hex Notation

Mark Davis ☕ Wed, 26 Jan 2011 13:37:14 -0800

Ok, now I understand. With that change, the situation is much better. It
doesn't fully satisfy RL1.1, because you can't use hex codepoint numbers --
you have to use the fairly ugly workaround of


      String hexPattern = codePoint <= 0xFFFF

? String.format("\\u%04x", codePoint)

: String.format("\\u%04x\\u%04x", (int) Character.toChars(codePoint)[0], (
int) Character.toChars(codePoint)[1]);



BTW, in plain Java I really miss a few of the ICU4J routines, like:


   - char c1 = UTF16.getLeadSurrogate(codePoint);
   - char c2 = UTF16.getLeadSurrogate(codePoint);
   - String s = UTF16.valueOf(codePoint);

You can do them in plain Java, as in the above expression, but they're
awkward and not as clear to read. And instead of the third one, the best I
see in plain Java is the following, which is really pretty ugly (is there
any better way?).


   String s = new StringBuilder().appendCodePoint(codePoint).toString();


Mark


*— Il meglio è l’inimico del bene —*


On Wed, Jan 26, 2011 at 12:47, Xueming Shen <xueming.s...@oracle.com> wrote:

> Oh, I see the problem. Obviously I have been working on jdk7 too long and
> forgot the
> latest release is still 6:-( There is indeed a bug in the previous
> implementation which I
> fixed in 7 long time ago (I mentioned this in one of the early emails but
> was not specific,
> my apology), probably should backport to 6 update release asap. The test
> case runs well
> (the "failures" in literals are expected) on 7 with the following output. I
> modified your test
> case "slightly" since it appears the UnicodeSet class in our normalizer
> package does not
> have the size(), replace it with a normal hashset.
>

Re: RL1.1 Hex Notation

Reply via email to