Re: gvim and Unicode

A.J.Mechelynck Fri, 26 Jan 2007 15:16:01 -0800

Jon Noring wrote:

I've been a long-time user of vi editors on Windows (lemmy and an older
version of vim) and now am looking for a vi editor for Windows that supports
the Unicode encodings (such as UTF-8, UTF-16, etc.)


So I installed the latest gvim, version 7, but am disappointed that on my
system at least (Windows XP), it doesn't recognize UTF-8 documents, so
characters outside of the ASCII range are not being rendered properly (it
appears gvim assumes the documents are ISO-8859 encoded.) In addition, in
the documentation and menus, I see nothing mentioned about Unicode, UTF-8
encoding, etc.

So what's going on? I was under the impression that in gvim I'd have a UTF-8
capable editor.

Thanks!

Jon Noring

gvim does support Unicode, but it may be easier or harder depending on your OSand its settings. The easiest is of course if you start gvim in a Unicodelocale, or, on Unix, if you run a version compiled for the GTK2 toolkit (whichuses Unicode by default). Here is a code snippet which you can paste into yourvimrc to enable support for Unicode in all versions which have Unicode supportcompiled-in.


if has("multi_byte")    " if not, we need to recompile
  if &enc !~? '^u'      " if the locale 'encoding' starts with u or U
                        " then Unicode is already set
    if &tenc == ''
      let &tenc = &enc  " save the keyboard charset
    endif
    set enc=utf-8       " to support Unicode fully, we need to be able
                        " to represent all Unicode codepoints in memory
  endif
  set fencs=ucs-bom,utf-8,latin1
  setg bomb             " default for new Unicode files
  setg fenc=latin1      " default for files created from scratch
else
  echomsg 'Warning: Multibyte support is not compiled-in.'
endif

You must also set a 'guifont' which includes the glyphs you will need, butmost fonts don't cover the whole range of "assigned" Unicode codepoints fromU+0000 (well, U+0020 since 0-1F are not "printable") to U+10FFFF (well,U+10FFFD since anything ending in FFFE or FFFF is invalid). If you are likeme, you will have to set different fonts at different times depending on whatlanguages you're editing at any particular moment. Courier New has (in myexperience) a wide coverage for "alphabetic" languages (Latin, Greek,Cyrillic, Hebrew, Arabic); for Far Eastern scripts you will need some otherfont such as FZ FangSong or MingLiU.


With the above settings, Unicode files will be recognised when possible:

- Any file starting with a BOM will be properly recognised as the appropriateUnicode encoding (out of, IIUC, UTF-8, UTF-16be, UTF-16le, UTF-32be and UTF-32le).- Files with no BOM will still be recognised as UTF-8 if they include nothingthat is invalid in UTF-8.

- Fallback is to Latin1.

- The above means that 7-bit US-ASCII will be diagnosed as UTF-8; this is nota problem as long as you don't add to them any characters with the high bitset, since the codepoints U+0000 to U+007F have both the same meaning and thesame representation in ASCII and UTF-8. The first time you add a characterabove 0x7F to such a file, you will have to save it with, for instance,


        :setlocal fenc=latin1
        :w

if you want it to be encoded in Latin1. From then on, the file (containing oneor more bytes with high bit set in combinations invalid in UTF-8) will berecognised as Latin1 by the 'fileencodings' heuristics set above.- It also means that for non-UTF-8 Unicode files with no BOM, or in generalfor anything not autodetected (such as 8-bit files other than Latin1), youwill have to specify the encoding yourself (e.g. ":e ++enc=utf-16lefilename.txt").

Also with the above settings, new files will be created in Latin1. To create anew file in UTF-8, use for instance


        :enew
        :setlocal fenc=utf-8


See
        :help Unicode
        :help 'encoding'
        :help 'termencoding'
        :help 'fileencodings'
        :help 'fileencoding'
        :help 'bomb'
        :help ++opt


HTH,
Tony.

Re: gvim and Unicode

Reply via email to