yeah, using the newer 'emacs-snapshot' (GNU Emacs 22.0.91.1) here on ubuntu feisty solves most of the UTF-8 related problems in emacs, including command line argument encoding. since i deal with some data in non-utf-8 encodings (iso-2022, iso-2022-jp, iso-8859-x, etc.) and interact with other X11 applciations that use compound-text in their selections, i do not think some of those settings would work for me.
i agree that looking for a particular substring in the locale name is the wrong approach. on a linux system i should perhaps base this on the output of the "locale charmap" command instead, but my rusty elisp is not up to that task at the moment. fortunately the UTF-8 locales all seem to end with ".UTF-8" on this system. On 3/18/07, Rich Felker <[EMAIL PROTECTED]> wrote:
On Sun, Mar 18, 2007 at 08:41:48AM -0700, Ben Wiley Sittler wrote: > awesome, and thank you! however, utf-8 filenames given on the command > line still do not work... the get turned into iso-8859-1, which is > then utf-8 encoded before saving (?!) > > here's my (partial) utf-8 workaround for emacs so far: > > (if (string-match "XEmacs\\|Lucid" emacs-version) > nil > (condition-case nil (eval > (if > (string-match "\\.\\(UTF\\|utf\\)-?8$" > (or (getenv "LC_CTYPE") > (or (getenv "LC_ALL") > (or (getenv "LANG") > "C")))) > '(concat (set-terminal-coding-system 'utf-8) > (set-keyboard-coding-system 'utf-8) > (set-default-coding-systems 'utf-8) > (setq file-name-coding-system 'utf-8) > (set-language-environment "UTF-8")))) > ((error "Language environment not defined: \"UTF-8\"") nil))) Here are all my relevant emacs settings. They work in at least emacs-21 and later; however, emacs-21 seems to be having trouble with UTF-8 on the command line and I don't know any way around that. ; Force unix and utf-8 (setq inhibit-eol-conversion t) (prefer-coding-system 'utf-8) (setq locale-coding-system 'utf-8) (set-terminal-coding-system 'utf-8) (set-keyboard-coding-system 'utf-8) (set-selection-coding-system 'utf-8) (setq file-name-coding-system 'utf-8) (setq coding-system-for-read 'utf-8) (setq coding-system-for-write 'utf-8) Note that the last two may be undesirable; they force ALL files to be treated as UTF-8, skipping any detection. This allows me to edit files which may have invalid sequences in them (like Kuhn's decoder test file) or which are a mix of binary data and UTF-8. I use the experimental unicode-2 branch of GNU emacs, and with it, forcing UTF-8 does not corrupt non-UTF-8 files. The invalid sequences are simply shown as octal byte codes and saved back to the file as they were in the source. I cannot confirm that this will not corrupt files on earlier versions of GNU emacs, however, and XEmacs ALWAYS corrupts files visited as UTF-8 (it converts any unicode character for which it does not have a corresponding emacs-mule character into a replacement character) so it's entirely unsuitable for use with UTF-8 until that's fixed (still broken in latest cvs as of a few months ago..). BTW looking for "UTF-8" in the locale string is a bad idea since UTF-8 is not necessarily a "special" encoding but may be the "native" encoding for the selected language. nl_langinfo(CODESET) is the only reliable determination and I doubt emacs provides any direct way of accessing it. :( ~Rich -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
-- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
