Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

Rich Felker Thu, 26 Apr 2007 16:34:06 -0700

On Thu, Apr 26, 2007 at 03:44:33PM +0600, Christopher Fynn wrote:
> N3266
> 
> UCS Transformation Formats summary, non-error and error sequences – 
> feedback on N3248
> 
> <http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3266.doc>


I must say this is a rather stupid looking proposal. The C0 controls
already have application-defined semantics; trying to give them a
universal meaning like this is a very bad idea. Keep in mind that
U+001A is ^Z, so for example if a terminal emulator converted bogus
UTF-8 from an X11 paste into this character, it would send (possibly
many) suspend commands to the application. Certainly not what the user
had in mind!!

Moreover, C0 and C1 control codes (minus newline and perhaps tab),
along with Unicode line/paragraph separator, should be considered
INVALID in plain text themselves. So generating them as a means of
error replacement is counterproductive as the ^Z's could be seen as
errors in themselves.

Also note that ^Z is DOS EOF. I bet some bad Windows software would
truncate files at the first ^Z...

Finally, I think the fact that this document was submitted in MS Word
form speaks for the author's qualifications (or lack thereof) to
design such a specification...

Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

Reply via email to