On Thu, Apr 26, 2007 at 03:44:33PM +0600, Christopher Fynn wrote: > N3266 > > UCS Transformation Formats summary, non-error and error sequences – > feedback on N3248 > > <http://std.dkuug.dk/jtc1/sc2/wg2/docs/N3266.doc>
I must say this is a rather stupid looking proposal. The C0 controls already have application-defined semantics; trying to give them a universal meaning like this is a very bad idea. Keep in mind that U+001A is ^Z, so for example if a terminal emulator converted bogus UTF-8 from an X11 paste into this character, it would send (possibly many) suspend commands to the application. Certainly not what the user had in mind!! Moreover, C0 and C1 control codes (minus newline and perhaps tab), along with Unicode line/paragraph separator, should be considered INVALID in plain text themselves. So generating them as a means of error replacement is counterproductive as the ^Z's could be seen as errors in themselves. Also note that ^Z is DOS EOF. I bet some bad Windows software would truncate files at the first ^Z... Finally, I think the fact that this document was submitted in MS Word form speaks for the author's qualifications (or lack thereof) to design such a specification... Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
