Bruno Haible wrote on 2000-07-27 21:45 UTC:
> I much prefer the "garbage in - error message" way, because it
> enables the user or sysadmin to fix the problem (read: call recode
> on the data files). The appearance of U+FFFD is a kind of error
> message.
Agreed. And the appearance of a U+DCxx (which in UTF-16 is not preceded
by a high sorrugate) is equally "a kind of error message". Just one that
contains a bit (well, seven :-) more information.
I see valuable binary data (PDF & ZIP files, etc.) being destroyed
almost every day by accidentally applied stupid lossy CRLF -> LF -> CRLF
data conversion that supposedly smart software tries to perform on the
fly. I foresee similar non-recoverable data conversion accidents if we
try to establish software that wipes out malformed UTF-8 sequence
without mercy and destructs all information that they might have
contained.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/