Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Asmus Freytag via Unicode Mon, 15 May 2017 10:59:34 -0700

On 5/15/2017 3:21 AM, Henri Sivonen via Unicode wrote:

Second, the political reason:


Now that ICU is a Unicode Consortium project, I think the Unicode
Consortium should be particular sensitive to biases arising from being
both the source of the spec and the source of a popular
implementation. It looks*really bad*  both in terms of equal footing
of ICU vs. other implementations for the purpose of how the standard
is developed as well as the reliability of the standard text vs. ICU
source code as the source of truth that other implementors need to pay
attention to if the way the Unicode Consortium resolves a discrepancy
between ICU behavior and a well-known spec provision (this isn't some
ill-known corner case, after all) is by changing the spec instead of
changing ICU*especially*  when the change is not neutral for
implementations that have made different but completely valid per
then-existing spec and, in the absence of legacy constraints, superior
architectural choices compared to ICU (i.e. UTF-8 internally instead
of UTF-16 internally).

I can see the irony of this viewpoint coming from a WHATWG-aligned
browser developer, but I note that even browsers that use ICU for
legacy encodings don't use ICU for UTF-8, so the ICU UTF-8 behavior
isn't, in fact, the dominant browser UTF-8 behavior. That is, even
Blink and WebKit use their own non-ICU UTF-8 decoder. The Web is the
environment that's the most sensitive to how issues like this are
handled, so it would be appropriate for the proposal to survey current
browser behavior instead of just saying that ICU "feels right" or is
"natural".

I think this political reason should be taken very seriously. There arealready too many instances where ICU can be seen "driving" thedevelopment of property and algorithms.

Those involved in the ICU project may not see the problem, but I agreewith Henri that it requires a bit more sensitivity from the UTC.

A./

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Reply via email to