Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

Christopher Fynn Fri, 27 Apr 2007 04:17:54 -0700

Rich Felker wrote:

On Thu, Apr 26, 2007 at 03:44:33PM +0600, Christopher Fynn wrote:

N3266

UCS Transformation Formats summary, non-error and error sequences –feedback on N3248

<http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3266.doc>

I must say this is a rather stupid looking proposal. The C0 controls
already have application-defined semantics; trying to give them a
universal meaning like this is a very bad idea. Keep in mind that
U+001A is ^Z, so for example if a terminal emulator converted bogus
UTF-8 from an X11 paste into this character, it would send (possibly
many) suspend commands to the application. Certainly not what the user
had in mind!!

Moreover, C0 and C1 control codes (minus newline and perhaps tab),
along with Unicode line/paragraph separator, should be considered
INVALID in plain text themselves. So generating them as a means of
error replacement is counterproductive as the ^Z's could be seen as
errors in themselves.

Also note that ^Z is DOS EOF. I bet some bad Windows software would
truncate files at the first ^Z...


N3266 was discussed and rejected by WG2 yesterday. As you pointed out
there are all sorts of problems with this proposal, and accepting it
would break many existing implementations.

Finally, I think the fact that this document was submitted in MS Word
form speaks for the author's qualifications (or lack thereof) to
design such a specification...

WG2 documents are all supposed to be submitted in MS Word .doc format -fortunately OO.o Writer can also generate this file format. I got awaywith submitting N3240 in PDF format generated by OO.o.


- Chris





--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode, ISO/IEC 10646 Synchronization Issues for UTF-8

Reply via email to