Followup to: <[EMAIL PROTECTED]>
By author: Markus Kuhn <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
>
> I just noticed that when I work in a UTF-8 locale (LC_CTYPE=en_GB.UTF-8),
> that vim 6.0 normally opens a UTF-8 file such as
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/examples/lyrics-ipa.txt
>
> properly in UTF-8 mode, but it deactivates UTF-8 mode when you load
> instead a file that contains malformed sequences, such as
>
> http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
>
It needs to do something sensible to encode malformed sequences, so
you can do lossless binary editing.
One way is to treat each byte of a malformed sequence as a character
(different from all real Unicode characters). This is a mostly good
approach, except that it allows the user to construct a valid UTF-8
character out of malformed sequence escapes -- this may or may not be
a problem in any particular application, but it needs to take into
account, lest we get another instance of the overlong sequence
problem.
-hpa
--
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt <[EMAIL PROTECTED]>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/