Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Hans Åberg via Unicode Tue, 16 May 2017 06:28:42 -0700

> On 16 May 2017, at 15:00, Philippe Verdy <verd...@wanadoo.fr> wrote:
> 
> 2017-05-16 14:44 GMT+02:00 Hans Åberg via Unicode <unicode@unicode.org>:
> 
> > On 15 May 2017, at 12:21, Henri Sivonen via Unicode <unicode@unicode.org> 
> > wrote:
> ...
> > I think Unicode should not adopt the proposed change.
> 
> It would be useful, for use with filesystems, to have Unicode codepoint 
> markers that indicate how UTF-8, including non-valid sequences, is translated 
> into UTF-32 in a way that the original octet sequence can be restored.
> 
> Why just UTF-32 ?


Synonym for codepoint numbers. It would suffice to add markers how it is 
translated. For example, codepoints meaning "overlong long length <number>", 
"byte", or whatever is useful.

> How would you convert ill-formed UTF-8/UTF-16/UTF-32 to valid 
> UTF-8/UTF-16/UTF-32 ?

You don't. You have a filename, which is a octet sequence of unknown encoding, 
and want to deal with it. Therefore, valid Unicode transformations of the 
filename may result in that is is not being reachable.

It only matters that the correct octet sequence is handed back to the 
filesystem. All current filsystems, as far as experts could recall, use octet 
sequences at the lowest level; whatever encoding is used is built in a layer 
above.

Re: Feedback on the proposal to change U+FFFD generation when decoding ill-formed UTF-8

Reply via email to