Am 26.03.2005 um 03:18 schrieb Richard Stallman:

    Emacs needs to learn more about combining characters before it can
    handle such things correctly.

Emacs knows quite a bit about combining characters;
what precisely is it doing wrong?


In xterm's tcsh these are set:

LANG=de_DE.UTF-8
LC_ALL=de_DE.UTF-8
LC_CTYPE=de_DE.UTF-8
TERM=xterm-color
TERMPATH=/Users/pete/.termcap:/usr/share/misc/termcap:/usr/X11R6/lib/ X11/etc/xterm.termcap
nokanji
version tcsh 6.12.00 (Astron) 2002-07-23 (powerpc-apple-darwin) options 8b,nls,dl,al,kan,sm,rh,color,dspm,filec

ls -lw shows these file names correctly:

-rwxrwxr-x 1 pete pete 32216 17 Nov 2002 RGB ÃÃÃÃÃÃÃÃ.txt
-rw-r--r-- 1 pete pete 62 25 MÃr 01:38 ÃÃÃÅÃ.txt
-rw-r--r-- 1 pete pete 107 2 Dez 21:29 ÃÃÃÃÃÃÃâ


as Finder or dired-mode in X11 do too. In GNU Emacs 22.0.50.1 (powerpc-apple-darwin7.8.0, X toolkit, Xaw3d scroll bars) of 2005-03-25 on localhost (last CVS update on 2005-03-19, patches from Stefan Monnier)
configured using `configure '--without-carbon' '--with-x' '--without-pop' '--with-xpm' '--with-jpeg' '--with-tiff' '--with-png' '--with-gif' '--with-x-toolkit=lucid' 'CFLAGS=-I/sw/include' 'CPPFLAGS=-I/sw/include' 'LDFLAGS=-L/sw/lib''


Important settings:
  value of $LC_ALL: de_DE.UTF-8
  value of $LC_COLLATE: nil
  value of $LC_CTYPE: de_DE.UTF-8
  value of $LC_MESSAGES: nil
  value of $LC_MONETARY: nil
  value of $LC_NUMERIC: nil
  value of $LC_TIME: nil
  value of $LANG: de_DE.UTF-8
  locale-coding-system: utf-8
  default-enable-multibyte-characters: t

Major mode: Calendar

Minor modes in effect:
  auto-compression-mode: t
  display-time-mode: t
  mouse-sel-mode: t
  show-paren-mode: t
  encoded-kbd-mode: t
  menu-bar-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  unify-8859-on-decoding-mode: t
  unify-8859-on-encoding-mode: t
  utf-translate-cjk-mode: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

in its dired-mode (-uuu:%% in modeline) in xterm I can see the file names mentioned above correctly, only the right most position the cursor can have is a few columns after the file's name ends. For RGB ÃÃÃÃÃÃÃÃ.txt the name ends in column 69, C-e leads the cursor to column 76 (taken from column-number-mode). 'Spelling' (C-u C-x =) the file's name I have (starting with <SPC> at column 57:

  character: SPC (040, 32, 0x20, U+0020)
    charset: ascii (ASCII (ISO646 IRV))
 code point: 32
     syntax:    which means: whitespace
   category: a:ASCII   l:Latin
buffer code: 0x20
  file code: 0x20 (encoded by coding system mule-utf-8)
    display: terminal code 0x20

character: a (0141, 97, 0x61, U+0061) ; *Help* in -uuu:-- and minibuffer
charset: ascii (ASCII (ISO646 IRV)) ; show an 'a', column 58, should
code point: 97 ; be 'Ã'
syntax: w which means: word
category: a:ASCII l:Latin
buffer code: 0x61
file code: 0x61 (encoded by coding system mule-utf-8)
display: terminal code 0x61


character: (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: 37 72 ; minibuffer shows a dieresis on :
syntax: w which means: word ; after Char, *Help* shows nothing,
category: ^:Combining diacritic or mark ; column is 59, should be 'Ã' now
buffer code: 0x9C 0xF4 0xA5 0xC8
file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
display: terminal code 0xCC 0x88


  character: o (0157, 111, 0x6f, U+006F)                ; at column 60 is an 'Ã'
    charset: ascii (ASCII (ISO646 IRV))
 code point: 111
     syntax: w  which means: word
   category: a:ASCII   l:Latin
buffer code: 0x6F
  file code: 0x6F (encoded by coding system mule-utf-8)
    display: terminal code 0x6F

character: (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: 37 72 ; minibuffer shows a dieresis on :
syntax: w which means: word ; after Char, *Help* shows nothing,
category: ^:Combining diacritic or mark ; should be 'Ã' in column 61
buffer code: 0x9C 0xF4 0xA5 0xC8
file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
display: terminal code 0xCC 0x88


now in first column in dired after RGB ÃÃÃÃÃÃÃÃ.txt:

character: (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: 37 72 ; minibuffer shows and describes A,
syntax: w which means: word ; *Help* shows nothing as character
category: ^:Combining diacritic or mark ; column is 70, should be linefeed
buffer code: 0x9C 0xF4 0xA5 0xC8
file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
display: terminal code 0xCC 0x88


character: (01211310, 332488, 0x512c8, U+0308)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point: 37 72 ; minibuffer shows a dieresis on :
syntax: w which means: word ; after Char, *Help* shows nothing,
category: ^:Combining diacritic or mark ; column is 71, one after linefeed
buffer code: 0x9C 0xF4 0xA5 0xC8
file code: 0xCC 0x88 (encoded by coding system mule-utf-8)
display: terminal code 0xCC 0x88


The next characters in the file's name are '.', 't', 'x', 't', and 'C-j' at column 76. Next <right> brings cursor into next line. This buffer too has line like this:

  -rw-r--r--    1 pete   pete      10992 13 MÃr 19:31 RefTeX-inst.txt

Here the cursor's right most position is one after 'txt' and the month's name 'MÃr' is spelled like this:

  character: M (0115, 77, 0x4d, U+004D)
    charset: ascii (ASCII (ISO646 IRV))
 code point: 77
     syntax: w  which means: word
   category: a:ASCII   l:Latin
buffer code: 0x4D
  file code: 0x4D (encoded by coding system mule-utf-8)
    display: terminal code 0x4D

character: Ã (04344, 2276, 0x8e4, U+00E4)
charset: latin-iso8859-1
(Right-Hand Part of Latin Alphabet 1 (ISO/IEC 8859-1): ISO-IR-100.)
code point: 100
syntax: w which means: word
category: l:Latin
buffer code: 0x81 0xE4
file code: 0xC3 0xA4 (encoded by coding system mule-utf-8)
display: terminal code 0xC3 0xA4


  character: r (0162, 114, 0x72, U+0072)
    charset: ascii (ASCII (ISO646 IRV))
 code point: 114
     syntax: w  which means: word
   category: a:ASCII   l:Latin
buffer code: 0x72
  file code: 0x72 (encoded by coding system mule-utf-8)
    display: terminal code 0x72

Incremental search for 'Ã' lets me find the month MÃr, but not as part of a file's name!

Here is some mule-diag:

        Multibyte characters awareness:
          default: t
          current-buffer: t
        
        Current language environment: German
        
        ########################################
        # Section 2.  Display
        ########################################
        
        Terminal: xterm-color
        
        Coding system of the terminal: utf-8


Anything more?

--
Greetings

  Pete



_______________________________________________
Emacs-pretest-bug mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug

Reply via email to