Kenichi Handa <[EMAIL PROTECTED]> writes: > In article <[EMAIL PROTECTED]>, Joe Wells <[EMAIL PROTECTED]> writes: > >> I'm using the Gentoo ebuild app-editors/emacs-22.0.50_pre20050225 >> which is based on a CVS snapshot from last year. > >> Try evaluating this: > >> (let ((unicode-char-hex-string >> (format "%x" >> (encode-char >> (aref (decode-coding-string >> ;; UTF-8 for U+1D161 (MUSICAL SYMBOL SIXTEENTH >> NOTE): >> "\355\205\241" >> 'utf-8) 0) >> 'ucs)))) >> (if (equal "d161" unicode-char-hex-string) >> (error "Oh no! Emacs dropped 17th bit when decoding the >> character!"))) > > That version of Emacs supports only BMP as written in the > documenation of utf-8 coding system.
Yes, but it should handle the character in the same way as any other character outside of its range. There is this comment in utf-8.el: ;; We compose the untranslatable sequences into a single character, ;; and move point to the next character. ;; This is infelicitous for editing, because there's currently no ;; mechanism for treating compositions as atomic, but is OK for ;; display. They are composed to U+FFFD with help-echo which ;; indicates the unicodes they represent. ... In my case, this seemed not to be working. Instead, it seemed it was translating the sequence to the wrong character. However, I have since discovered the real problem. I was editing the file /usr/lib/X11/locale/en_US.UTF-8/Compose and it has a line that reads like this: ---------------------------------------------------------------------- <Multi_key> <U1d15f> <U1d16f> : "텡" U1D161 # MUSICAL SYMBOL SIXTEENTH NOTE ---------------------------------------------------------------------- However, although it claims on the line that the code of the character in the quotes is U+1D161, in fact the character there is actually U+D161 encoded in UTF-8 as ED 85 A1. The correct UTF-8 encoding of U+1D161 would be F0 9D 85 A1. Sorry for the false alarm! The bug is in the xorg-x11 distribution on my machine. I was wrong to believe this file was correct. -- Joe > u -- utf-8 (alias of mule-utf-8) > > UTF-8 encoding for Emacs-supported Unicode characters. > It supports Unicode characters of these ranges: > U+0000..U+33FF, U+E000..U+FFFF. > They correspond to these Emacs character sets: > ascii, latin-iso8859-1, mule-unicode-0100-24ff, > mule-unicode-2500-33ff, mule-unicode-e000-ffff > [...] > > --- > Kenichi Handa > [EMAIL PROTECTED] _______________________________________________ emacs-pretest-bug mailing list [email protected] http://lists.gnu.org/mailman/listinfo/emacs-pretest-bug
