[classlib][luni]difference between RI and ICU

Tony Wu Mon, 11 Sep 2006 23:21:31 -0700

I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
There is a difference between RI and ICU.


RI spec says,

It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
'\u2007', '\u202F').


but ICU spec says,

It is a Unicode space separator (category "Zs"), but is not a no-break
space (\u00A0 or \u202F or \uFEFF).


RI excludes U+2007 however ICU excludes U+FEFF

And I looked up the definition of these 4 related characters on unicode.org:

00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;NON-BREAKING SPACE;;;;
2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;;
202F;NARROW NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;;;;;
FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;;;;;N;BYTE ORDER MARK;;;;



I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as
ICU spec described. And I purposed to report that to ICU team.
Should I handle the U+2007 by ourselves to follow RI or just document this
problem in testcase?

--
Tony Wu
China Software Development Lab, IBM

[classlib][luni]difference between RI and ICU

Reply via email to