Re: multilingual man pages

Bruno Haible Wed, 11 Apr 2001 08:30:33 -0700
Tomohiro KUBOTA writes:

> > >   -*- coding: foo -*-
> 
> > Will "foo" be a standard IANA encoding name, or will it be an Emacs
> > name?
> 
> Both, as far as possible.  My implementation has Emacs name -> IANA
> name conversion table.  It will also need IANA name -> iconv() name
> conversion table (sensible to platform).

Good. But in those cases where Emacs doesn't know the IANA name, users
will be forced to use the Emacs name, right? So they will end up using
the Emacs name in some cases and the standard name in others.

Wouldn't it be better to use standard names in all cases, and use a
simple Emacs lisp function to convert the standard name to an Emacs
name?  The Emacs PO mode already has code for this. It's quite easy,
and it has the advantage of constraining Emacs-isms to Emacs proper.

Here's the Emacs code that's used for the PO mode.

(autoload 'po-find-file-coding-system "po-mode")
(modify-coding-system-alist 'file "\\.po[tx]?\\'\\|\\.po\\."
                            'po-find-file-coding-system)

(defconst po-content-type-charset-alist
  '(; Note: Emacs 20 doesn't support all encodings, thus the missing entries.
    (ASCII . undecided)
    (ANSI_X3.4-1968 . undecided)
    (US-ASCII . undecided)
    (ISO-8859-1 . iso-8859-1)
    (ISO_8859-1 . iso-8859-1)
    (ISO-8859-2 . iso-8859-2)
    (ISO_8859-2 . iso-8859-2)
    (ISO-8859-3 . iso-8859-3)
    (ISO_8859-3 . iso-8859-3)
    (ISO-8859-4 . iso-8859-4)
    (ISO_8859-4 . iso-8859-4)
    (ISO-8859-5 . iso-8859-5)
    (ISO_8859-5 . iso-8859-5)
    ;(ISO-8859-6 . ??)
    ;(ISO_8859-6 . ??)
    (ISO-8859-7 . iso-8859-7)
    (ISO_8859-7 . iso-8859-7)
    (ISO-8859-8 . iso-8859-8)
    (ISO_8859-8 . iso-8859-8)
    (ISO-8859-9 . iso-8859-9)
    (ISO_8859-9 . iso-8859-9)
    ;(ISO-8859-13 . ??)
    ;(ISO_8859-13 . ??)
    ;(ISO-8859-15 . ??)
    ;(ISO_8859-15 . ??)
    (KOI8-R . koi8-r)
    ;(KOI8-U . ??)
    ;(CP850 . ??)
    ;(CP866 . ??)
    ;(CP874 . ??)
    ;(CP932 . ??)
    ;(CP949 . ??)
    ;(CP950 . ??)
    ;(CP1250 . ??)
    ;(CP1251 . ??)
    ;(CP1252 . ??)
    ;(CP1253 . ??)
    ;(CP1254 . ??)
    ;(CP1255 . ??)
    ;(CP1256 . ??)
    ;(CP1257 . ??)
    ;(GB2312 . euc-cn)
    ;(EUC-JP . euc-jp)
    ;(EUC-KR . euc-kr)
    ;(EUC-TW . ??)
    ;(BIG5 . big5)
    ;(BIG5HKSCS . ??)
    ;(GBK . ??)
    ;(GB18030 . ??)
    ;(SJIS . shift_jis)
    ;(JOHAB . ??)
    ;(TIS-620 . th-tis620)
    ;(VISCII . viscii)
    (UTF-8 . utf-8)        ; requires Mule-UCS in Emacs 20, or Emacs 21
    )
  "How to convert a GNU libc/libiconv canonical charset name as seen in
Content-Type into a Mule coding system.")

      (defun po-find-file-coding-system (arg-list)
        "Return a Mule (DECODING . ENCODING) pair, according to PO file charset.
Called through file-coding-system-alist, before the file is visited for real."
        (and (eq (car arg-list) 'insert-file-contents)
             (with-temp-buffer
               (let ((coding-system-for-read 'no-conversion))
                 (insert-file-contents (nth 1 arg-list) nil 0 4096)
                 (if (re-search-forward
                      "^\"Content-Type: text/plain;[ \t]*charset=\\([^\\]+\\)"
                      nil t)
                     (let* ((charset (buffer-substring
                                       (match-beginning 1) (match-end 1)))
                            (charset-upper (intern (upcase charset)))
                            (charset-lower (intern (downcase charset))))
                       (list (or (cdr (assq charset-upper
                                            po-content-type-charset-alist))
                                 (if (memq charset-lower (coding-system-list))
                                     charset-lower
                                   'no-conversion))))
                   '(no-conversion))))))

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: multilingual man pages

Reply via email to