Bruno Haible wrote on 2000-07-27 21:45 UTC:
>   I much prefer the "garbage in - error message" way, because it
>   enables the user or sysadmin to fix the problem (read: call recode
>   on the data files). The appearance of U+FFFD is a kind of error
>   message.

Agreed. And the appearance of a U+DCxx (which in UTF-16 is not preceded
by a high sorrugate) is equally "a kind of error message". Just one that
contains a bit (well, seven :-) more information.

I see valuable binary data (PDF & ZIP files, etc.) being destroyed
almost every day by accidentally applied stupid lossy CRLF -> LF -> CRLF
data conversion that supposedly smart software tries to perform on the
fly. I foresee similar non-recoverable data conversion accidents if we
try to establish software that wipes out malformed UTF-8 sequence
without mercy and destructs all information that they might have
contained.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to