I think I have run into a serious bug with XFree86's Xutf8LookupString implementation. It occurs when the client runs under XFree86 4.[12], but the X server is for example Solaris 5.8
vendor string: Sun Microsystems, Inc. vendor release number: 6410 (also occurs on Solaris 5.7). It does but not occur when the X server is XFree86 4.1. i) To reproduce the problem, start an X client in the following environment: - use a UTF-8 locale (e.g., LC_CTYPE=en_GB.UTF-8) - use an XFree86 4.1 or 4.2 Linux system (tested on Red Hat 7.2, Red Hat 8.0 and SuSE 8.1) - point $DISPLAY to a Sun Solaris X server Then press on the Sun X server a key that causes the keysym "adiaeresis" to be sent to the above client. The client will receive from the various key string lookup functions the following strings (as displayed in a UTF-8 xterm, hex values provided for clarity): XLookupString gives 2 bytes: "ä" (c3 a4) XmbLookupString gives 4 bytes: "ä" (c3 83 c2 a4) Xutf8LookupString gives 4 bytes: "ä" (c3 83 c2 a4) There are two problems, the first critical, the second dubious: a) CRITICAL: Both X{mb,utf8}LookupString output the same broken byte sequence that one gets if one sends the UTF-8 sequence for "ä" (c3 a4) erroneously through an ISO 8859-1 -> UTF-8 converter, i.e. c3 83 c2 a4. b) DUBIOUS: XLookupString is according to the manual supposed to *always* return ISO 8859-1 strings (just like STRING atoms always use ISO 8859-1), but here it actually returns text in the locale's multibyte encoding. (This is ok, if we can agree to change the libX11 C API definition accordingly, but it looks suspiciously like someone has been HACKing without respect for the API spec). ii) If the same setup as in i) is used, but the locale of the client replaced with an ISO 8859-1 locale (e.g., en_GB), then the result looks correct (as displayed in an ISO 8859-1 xterm): XLookupString gives 1 bytes: "ä" (e4) XmbLookupString gives 1 bytes: "ä" (e4) Xutf8LookupString gives 2 bytes: "ä" (c3 a4) iii) Similarly, if the same setup as in i) is used, but the X server is XFree86 4 with e.g. vendor string: The XFree86 Project, Inc vendor release number: 40100000 XFree86 version: 4.1.0 at least the CRITICAL problem is gone (as displayed in a UTF-8 xterm): XLookupString gives 2 bytes: "ä" (c3 a4) XmbLookupString gives 2 bytes: "ä" (c3 a4) Xutf8LookupString gives 2 bytes: "ä" (c3 a4) All the above output is from a patched version of xev that outputs the strings from all three lookup functions. This bug report distills my findings reported here earlier that UTF-8 keyboard support fails from a Sun X server with the xterm and emacs implementations in Red Hat 8.0. Any ideas or reports of reproduceability would be welcome. This might turn into a high priority problem, as breaking the X protocol this way might be a major UTF-8 show stopper. Please have a look at it ... Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> _______________________________________________ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n