Am 26.03.2005 um 03:18 schrieb Richard Stallman:
Emacs needs to learn more about combining characters before it can handle such things correctly.
Emacs knows quite a bit about combining characters; what precisely is it doing wrong?
In xterm's tcsh these are set:
LANG=de_DE.UTF-8
LC_ALL=de_DE.UTF-8
LC_CTYPE=de_DE.UTF-8
TERM=xterm-color
TERMPATH=/Users/pete/.termcap:/usr/share/misc/termcap:/usr/X11R6/lib/ X11/etc/xterm.termcap
nokanji
version tcsh 6.12.00 (Astron) 2002-07-23 (powerpc-apple-darwin) options 8b,nls,dl,al,kan,sm,rh,color,dspm,filec
ls -lw shows these file names correctly:
-rwxrwxr-x 1 pete pete 32216 17 Nov 2002 RGB ÃÃÃÃÃÃÃÃ.txt
-rw-r--r-- 1 pete pete 62 25 MÃr 01:38 ÃÃÃÅÃ.txt
-rw-r--r-- 1 pete pete 107 2 Dez 21:29 ÃÃÃÃÃÃÃâ
as Finder or dired-mode in X11 do too. In GNU Emacs 22.0.50.1 (powerpc-apple-darwin7.8.0, X toolkit, Xaw3d scroll bars) of 2005-03-25 on localhost (last CVS update on 2005-03-19, patches from Stefan Monnier)
configured using `configure '--without-carbon' '--with-x' '--without-pop' '--with-xpm' '--with-jpeg' '--with-tiff' '--with-png' '--with-gif' '--with-x-toolkit=lucid' 'CFLAGS=-I/sw/include' 'CPPFLAGS=-I/sw/include' 'LDFLAGS=-L/sw/lib''
Important settings: value of $LC_ALL: de_DE.UTF-8 value of $LC_COLLATE: nil value of $LC_CTYPE: de_DE.UTF-8 value of $LC_MESSAGES: nil value of $LC_MONETARY: nil value of $LC_NUMERIC: nil value of $LC_TIME: nil value of $LANG: de_DE.UTF-8 locale-coding-system: utf-8 default-enable-multibyte-characters: t
Major mode: Calendar
Minor modes in effect: auto-compression-mode: t display-time-mode: t mouse-sel-mode: t show-paren-mode: t encoded-kbd-mode: t menu-bar-mode: t global-font-lock-mode: t font-lock-mode: t unify-8859-on-decoding-mode: t unify-8859-on-encoding-mode: t utf-translate-cjk-mode: t column-number-mode: t line-number-mode: t transient-mark-mode: t
in its dired-mode (-uuu:%% in modeline) in xterm I can see the file names mentioned above correctly, only the right most position the cursor can have is a few columns after the file's name ends. For RGB ÃÃÃÃÃÃÃÃ.txt the name ends in column 69, C-e leads the cursor to column 76 (taken from column-number-mode). 'Spelling' (C-u C-x =) the file's name I have (starting with <SPC> at column 57:
character: SPC (040, 32, 0x20, U+0020)
charset: ascii (ASCII (ISO646 IRV))
code point: 32
syntax: which means: whitespace
category: a:ASCII l:Latin
buffer code: 0x20
file code: 0x20 (encoded by coding system mule-utf-8)
display: terminal code 0x20character: a (0141, 97, 0x61, U+0061) ; *Help* in -uuu:-- and minibuffer
charset: ascii (ASCII (ISO646 IRV)) ; show an 'a', column 58, should
code point: 97 ; be 'Ã'
syntax: w which means: word
category: a:ASCII l:Latin
buffer code: 0x61
file code: 0x61 (encoded by coding system mule-utf-8)
display: terminal code 0x61
character: (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: 37 72 ; minibuffer shows a dieresis on :
syntax: w which means: word ; after Char, *Help* shows nothing,
category: ^:Combining diacritic or mark ; column is 59, should be 'Ã' now
buffer code: 0x9C 0xF4 0xA5 0xC8
file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
display: terminal code 0xCC 0x88
character: o (0157, 111, 0x6f, U+006F) ; at column 60 is an 'Ã'
charset: ascii (ASCII (ISO646 IRV))
code point: 111
syntax: w which means: word
category: a:ASCII l:Latin
buffer code: 0x6F
file code: 0x6F (encoded by coding system mule-utf-8)
display: terminal code 0x6Fcharacter: (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: 37 72 ; minibuffer shows a dieresis on :
syntax: w which means: word ; after Char, *Help* shows nothing,
category: ^:Combining diacritic or mark ; should be 'Ã' in column 61
buffer code: 0x9C 0xF4 0xA5 0xC8
file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
display: terminal code 0xCC 0x88
now in first column in dired after RGB ÃÃÃÃÃÃÃÃ.txt:
character: (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: 37 72 ; minibuffer shows and describes A,
syntax: w which means: word ; *Help* shows nothing as character
category: ^:Combining diacritic or mark ; column is 70, should be linefeed
buffer code: 0x9C 0xF4 0xA5 0xC8
file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
display: terminal code 0xCC 0x88
character: (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: 37 72 ; minibuffer shows a dieresis on :
syntax: w which means: word ; after Char, *Help* shows nothing,
category: ^:Combining diacritic or mark ; column is 71, one after linefeed
buffer code: 0x9C 0xF4 0xA5 0xC8
file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
display: terminal code 0xCC 0x88
The next characters in the file's name are '.', 't', 'x', 't', and 'C-j' at column 76. Next <right> brings cursor into next line. This buffer too has line like this:
-rw-r--r-- 1 pete pete 10992 13 MÃr 19:31 RefTeX-inst.txt
Here the cursor's right most position is one after 'txt' and the month's name 'MÃr' is spelled like this:
character: M (0115, 77, 0x4d, U+004D)
charset: ascii (ASCII (ISO646 IRV))
code point: 77
syntax: w which means: word
category: a:ASCII l:Latin
buffer code: 0x4D
file code: 0x4D (encoded by coding system mule-utf-8)
display: terminal code 0x4Dcharacter: Ã (04344, 2276, 0x8e4, U+00E4)
charset: latin-iso8859-1
(Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
code point: 100
syntax: w which means: word
category: l:Latin
buffer code: 0x81 0xE4
file code: 0xC3 0xA4 (encoded by coding system mule-utf-8)
display: terminal code 0xC3 0xA4
character: r (0162, 114, 0x72, U+0072)
charset: ascii (ASCII (ISO646 IRV))
code point: 114
syntax: w which means: word
category: a:ASCII l:Latin
buffer code: 0x72
file code: 0x72 (encoded by coding system mule-utf-8)
display: terminal code 0x72Incremental search for 'Ã' lets me find the month MÃr, but not as part of a file's name!
Here is some mule-diag:
Multibyte characters awareness:
default: t
current-buffer: t
Current language environment: German
########################################
# Section 2. Display
########################################
Terminal: xterm-color
Coding system of the terminal: utf-8
Anything more?
-- Greetings
Pete
_______________________________________________ Emacs-pretest-bug mailing list [email protected] http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
