Robert Hu 写道:
Tony Wu 写道:
I encounter a problem when implement isWhiteSpace(int) in j.l.Character.
There is a difference between RI and ICU.

RI spec says,


It is a Unicode szpace character (SPACE_SEPARATOR, LINE_SEPARATOR, or
PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0',
'\u2007', '\u202F').
Anyway, spec is our first rule to follow.
Information from unicode.org is also spec. unicode.org is more official. Since RI follows
unicode.org, we should also follow RI, in turn follows unicode.org

but ICU spec says,

It is a Unicode space separator (category "Zs"), but is not a no-break
space (\u00A0 or \u202F or \uFEFF).

RI excludes U+2007 however ICU excludes U+FEFF

And I looked up the definition of these 4 related characters on unicode.org:

00A0;NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;NON-BREAKING SPACE;;;;
2007;FIGURE SPACE;Zs;0;WS;<noBreak> 0020;;;;N;;;;;
202F;NARROW NO-BREAK SPACE;Zs;0;CS;<noBreak> 0020;;;;N;;;;;
FEFF;ZERO WIDTH NO-BREAK SPACE;Cf;0;BN;;;;;N;BYTE ORDER MARK;;;;
So cool... :-)

I consider it is a bug of ICU because the U+FEFF is not in category *Zs* as
ICU spec described. And I purposed to report that to ICU team.
Should I handle the U+2007 by ourselves to follow RI or just document this
problem in testcase?

IMO, it's natural to follow RI, and the challenge is to fix it gracefully with ICU implementation.



--
Spark Shen
China Software Development Lab, IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to