Markus Kuhn wrote:

> I just noticed that when I work in a UTF-8 locale (LC_CTYPE=en_GB.UTF-8),
> that vim 6.0 normally opens a UTF-8 file such as

Please use Vim 6.1 for this kind of testing.  With the released patches
if possible (using CVS is easiest).  Vim 6.0 is quite old now.

>   http://www.cl.cam.ac.uk/~mgk25/ucs/examples/lyrics-ipa.txt
> 
> properly in UTF-8 mode, but it deactivates UTF-8 mode when you load
> instead a file that contains malformed sequences, such as
> 
>   http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

Since this file contains byte sequences that are illegal in UTF-8, it is
converted to UTF-8 as if it were a latin1 file.  The converted text can
be edited normally.  When writing the file the conversion is done in
reverse, thus a read command followed by a write command produces an
identical file.

If you want to edit the file as if it were utf-8 you should first filter
out the illegal byte sequences.  To manually overrule the detection of
the encoding use this command:

        :edit ++enc=utf-8 UTF-8-test.txt

This is unsafe though, because you edit the file with the illegal byte
sequences.

> Even worse, it also deactivates UTF-8 mode when you load a file that
> contains new Unicode 3.2 characters, such as
> 
>   http://www.cl.cam.ac.uk/~mgk25/UTF-8-demo.txt

That should be:

        http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt

I can load this file without trouble with Vim 6.1.

> I live now on a planet were any other encoding than UTF-8 does not exist
> when I am in LC_CTYPE=en_GB.UTF-8. How do I tell vim 6.0 (and also
> emacs) to pick the encoding *strictly* based on the locale and look at
> absolutely nothing else? Falling back to ISO 8859-1 is not an option,
> because ISO 8859-1 is completely unknown on my planet.

If you only have UTF-8 files you don't need to do anything.  If you
communicate with other planets (and this message indicates you do :-)
you will have to be able to edit ISO-8859-1 files as well.

> Trying to escape the horrors and pain of automatic encoding detection in
> a pure UTF-8 environment ...

I haven't seen this planet yet.  And as soon as I see it, I'll send a
Latin1 file to it :-).  Conclusion: this UTF-8 only planet does not exist.

-- 
hundred-and-one symptoms of being an internet addict:
244. You use more than 20 passwords.

 ///  Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.moolenaar.net  \\\
///   Creator of Vim -- http://vim.sf.net -- ftp://ftp.vim.org/pub/vim   \\\
\\\           Project leader for A-A-P -- http://www.a-a-p.org           ///
 \\\ Lord Of The Rings helps Uganda - http://iccf-holland.org/lotr.html ///
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to