Re: Non-ASCII characters in file names

Ben Wiley Sittler Sun, 18 Mar 2007 12:22:50 -0800

yeah, using the newer 'emacs-snapshot' (GNU Emacs 22.0.91.1) here on
ubuntu feisty solves most of the UTF-8 related problems in emacs,
including command line argument encoding. since i deal with some data
in non-utf-8 encodings (iso-2022, iso-2022-jp, iso-8859-x, etc.) and
interact with other X11 applciations that use compound-text in their
selections, i do not think some of those settings would work for me.


i agree that looking for a particular substring in the locale name is
the wrong approach. on a linux system i should perhaps base this on
the output of the "locale charmap" command instead, but my rusty elisp
is not up to that task at the moment. fortunately the UTF-8 locales
all seem to end with ".UTF-8" on this system.

On 3/18/07, Rich Felker <[EMAIL PROTECTED]> wrote:

On Sun, Mar 18, 2007 at 08:41:48AM -0700, Ben Wiley Sittler wrote:
> awesome, and thank you! however, utf-8 filenames given on the command
> line still do not work... the get turned into iso-8859-1, which is
> then utf-8 encoded before saving (?!)
>
> here's my (partial) utf-8 workaround for emacs so far:
>
> (if (string-match "XEmacs\\|Lucid" emacs-version)
>    nil
>  (condition-case nil (eval
>                       (if
>                           (string-match "\\.\\(UTF\\|utf\\)-?8$"
>                                         (or (getenv "LC_CTYPE")
>                                             (or (getenv "LC_ALL")
>                                                 (or (getenv "LANG")
>                                                     "C"))))
>                           '(concat (set-terminal-coding-system 'utf-8)
>                                    (set-keyboard-coding-system 'utf-8)
>                                    (set-default-coding-systems 'utf-8)
>                                    (setq file-name-coding-system 'utf-8)
>                                    (set-language-environment "UTF-8"))))
>    ((error "Language environment not defined: \"UTF-8\"") nil)))

Here are all my relevant emacs settings. They work in at least
emacs-21 and later; however, emacs-21 seems to be having trouble with
UTF-8 on the command line and I don't know any way around that.

; Force unix and utf-8
(setq inhibit-eol-conversion t)
(prefer-coding-system 'utf-8)
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(setq file-name-coding-system 'utf-8)
(setq coding-system-for-read 'utf-8)
(setq coding-system-for-write 'utf-8)

Note that the last two may be undesirable; they force ALL files to be
treated as UTF-8, skipping any detection. This allows me to edit files
which may have invalid sequences in them (like Kuhn's decoder test
file) or which are a mix of binary data and UTF-8.

I use the experimental unicode-2 branch of GNU emacs, and with it,
forcing UTF-8 does not corrupt non-UTF-8 files. The invalid sequences
are simply shown as octal byte codes and saved back to the file as
they were in the source. I cannot confirm that this will not corrupt
files on earlier versions of GNU emacs, however, and XEmacs ALWAYS
corrupts files visited as UTF-8 (it converts any unicode character for
which it does not have a corresponding emacs-mule character into a
replacement character) so it's entirely unsuitable for use with UTF-8
until that's fixed (still broken in latest cvs as of a few months
ago..).

BTW looking for "UTF-8" in the locale string is a bad idea since UTF-8
is not necessarily a "special" encoding but may be the "native"
encoding for the selected language. nl_langinfo(CODESET) is the only
reliable determination and I doubt emacs provides any direct way of
accessing it. :(

~Rich

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/


--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Non-ASCII characters in file names

Reply via email to