Re: Unicode Normalization on Mac OS X (HFS+ filesystem)
Stephan Kleisinger wrote: Configuration Information [Automatically generated, do not change]: Machine: i386 OS: darwin9.3.0 Compiler: /usr/bin/gcc-4.0 Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i386' -DCONF_OSTYPE='darwin9.3.0' -DCONF_MACHTYPE='i386-apple-darwin9.3.0' -DCONF_VENDOR='apple' -DLOCALEDIR='/opt/local/share/locale' -DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -DMACOSX -I. -I. -I./include -I./lib -I/opt/local/include -O2 uname output: Darwin cicero.lan 9.3.0 Darwin Kernel Version 9.3.0: Fri May 23 00:49:16 PDT 2008; root:xnu-1228.5.18~1/RELEASE_I386 i386 Machine Type: i386-apple-darwin9.3.0 Bash Version: 3.2 Patch Level: 39 Release Status: release Description: The Mac OS X Filesystem HFS+ reports filenames in Unicode NFD Normalization form. As arguments (e.g. open()) all normalization forms are accepted. Input for the bash from the Terminal.app is usually in NFC. This results in problems. The German Direcory name Bücher (Buecher/Books): 1. Bash completion does not work BüTAB - Nothing BuTAB - Works (glob and Bu* work in the same way) 2. if \w is included in $PS1 the display length is calculated wrong so when using the arrow-keys to recall the history the display is disrupted 3. when an argument is Completeted (BuTAB - Bücher) the argument is in NFD. Deleting the argument results in wrong cursor position The problem is that there is no way using standard interfaces to distinguish or convert between the two forms on Mac OS X, at least none that I have found (and I've been looking at this for some time). I'm not particularly interested in using Mac OS-specific APIs; I have no experience with them. Chet -- ``The lyf so short, the craft so long to lerne.'' - Chaucer Chet Ramey, ITS, CWRU[EMAIL PROTECTED]http://cnswww.cns.cwru.edu/~chet/
Unicode Normalization on Mac OS X (HFS+ filesystem)
Configuration Information [Automatically generated, do not change]: Machine: i386 OS: darwin9.3.0 Compiler: /usr/bin/gcc-4.0 Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i386' - DCONF_OSTYPE='darwin9.3.0' -DCONF_MACHTYPE='i386-apple-darwin9.3.0' - DCONF_VENDOR='apple' -DLOCALEDIR='/opt/local/share/locale' - DPACKAGE='bash' -DSHELL -DHAVE_CONFIG_H -DMACOSX -I. -I. -I./ include -I./lib -I/opt/local/include -O2 uname output: Darwin cicero.lan 9.3.0 Darwin Kernel Version 9.3.0: Fri May 23 00:49:16 PDT 2008; root:xnu-1228.5.18~1/RELEASE_I386 i386 Machine Type: i386-apple-darwin9.3.0 Bash Version: 3.2 Patch Level: 39 Release Status: release Description: The Mac OS X Filesystem HFS+ reports filenames in Unicode NFD Normalization form. As arguments (e.g. open()) all normalization forms are accepted. Input for the bash from the Terminal.app is usually in NFC. This results in problems. The German Direcory name Bücher (Buecher/Books): 1. Bash completion does not work BüTAB - Nothing BuTAB - Works (glob and Bu* work in the same way) 2. if \w is included in $PS1 the display length is calculated wrong so when using the arrow-keys to recall the history the display is disrupted 3. when an argument is Completeted (BuTAB - Bücher) the argument is in NFD. Deleting the argument results in wrong cursor position Repeat-By: Use the Standard Terminal.app [EMAIL PROTECTED]:/tmp $ echo Bücher | hd 42 c3 bc 63 68 65 72 0a |Bücher.| -NFC 0008 [EMAIL PROTECTED]:/tmp $ mkdir Bücher | hd [EMAIL PROTECTED]:/tmp $ ls -d B* | hd 42 75 cc 88 63 68 65 72 0a | Bu?.cher.| -NFD 0009 [EMAIL PROTECTED]:/tmp $ cd Bü* bash: cd: Bü*: No such file or directory [EMAIL PROTECTED]:/tmp $ cd Bu* [EMAIL PROTECTED]:/tmp/Bücher $ using the history: [EMAIL PROTECTED]:/tmp/Bücher $ [EMAIL PROTECTED]:/tmp/Büchrenampstree -s Ter [EMAIL PROTECTED]:/tmp/Bücher $ cd .. using backspace to delete the completed Filename: cd BuTAB Backspace - the Cursor is one char left of its intended position smime.p7s Description: S/MIME cryptographic signature