Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Philippe Verdy via Unicode Mon, 15 May 2017 15:58:50 -0700

2017-05-15 19:54 GMT+02:00 Asmus Freytag via Unicode <unicode@unicode.org>:


> I think this political reason should be taken very seriously. There are
> already too many instances where ICU can be seen "driving" the development
> of property and algorithms.
>
> Those involved in the ICU project may not see the problem, but I agree
> with Henri that it requires a bit more sensitivity from the UTC.
>
I don't think that the fact that ICU was originately using UTF-16
internally has ANY effect on the decision to represent ill-formed sequences
as single or multiple U+FFFD.
The internal encoding has nothing in common with the external encoding used
when processing input data (which may be UTf-8, UTF-16, UTF-32, and could
in all case present ill-formed sequences). That internal encoding here will
paly no role in how to convert the ill-formed input, or if it will be
converted.
So yes, independantly of the internal encoding, we'll still ahve to choose
between:
- not converting the input and return an error or throw an exception
- converting the input using a single U+FFFD (in its internal
representation, this does not matter) to replace the complete sequence of
ill-formed code units in the input data, and preferably return an error
status
- converting the input using as many U+FFFD (in its internal
representation, this does not matter)  to replace every ocurence of
ill-formed code units in the input data, and preferably return an error
status.

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Reply via email to