Followup to:  <[EMAIL PROTECTED]>
By author:    Markus Kuhn <[EMAIL PROTECTED]>
In newsgroup: linux.utf8
>
> I just noticed that when I work in a UTF-8 locale (LC_CTYPE=en_GB.UTF-8),
> that vim 6.0 normally opens a UTF-8 file such as
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/examples/lyrics-ipa.txt
> 
> properly in UTF-8 mode, but it deactivates UTF-8 mode when you load
> instead a file that contains malformed sequences, such as
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
> 

It needs to do something sensible to encode malformed sequences, so
you can do lossless binary editing.

One way is to treat each byte of a malformed sequence as a character
(different from all real Unicode characters).  This is a mostly good
approach, except that it allows the user to construct a valid UTF-8
character out of malformed sequence escapes -- this may or may not be
a problem in any particular application, but it needs to take into
account, lest we get another instance of the overlong sequence
problem.

        -hpa
-- 
<[EMAIL PROTECTED]> at work, <[EMAIL PROTECTED]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt    <[EMAIL PROTECTED]>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to