2017-05-15 19:54 GMT+02:00 Asmus Freytag via Unicode <unicode@unicode.org>:
> I think this political reason should be taken very seriously. There are > already too many instances where ICU can be seen "driving" the development > of property and algorithms. > > Those involved in the ICU project may not see the problem, but I agree > with Henri that it requires a bit more sensitivity from the UTC. > I don't think that the fact that ICU was originately using UTF-16 internally has ANY effect on the decision to represent ill-formed sequences as single or multiple U+FFFD. The internal encoding has nothing in common with the external encoding used when processing input data (which may be UTf-8, UTF-16, UTF-32, and could in all case present ill-formed sequences). That internal encoding here will paly no role in how to convert the ill-formed input, or if it will be converted. So yes, independantly of the internal encoding, we'll still ahve to choose between: - not converting the input and return an error or throw an exception - converting the input using a single U+FFFD (in its internal representation, this does not matter) to replace the complete sequence of ill-formed code units in the input data, and preferably return an error status - converting the input using as many U+FFFD (in its internal representation, this does not matter) to replace every ocurence of ill-formed code units in the input data, and preferably return an error status.