I think I have run into a serious bug with XFree86's Xutf8LookupString
implementation. It occurs when the client runs under XFree86 4.[12], but
the X server is for example Solaris 5.8

  vendor string:    Sun Microsystems, Inc.
  vendor release number:    6410

(also occurs on Solaris 5.7). It does but not occur when the X server is
XFree86 4.1.

i) To reproduce the problem, start an X client in the following
environment:

  - use a UTF-8 locale (e.g., LC_CTYPE=en_GB.UTF-8)
  - use an XFree86 4.1 or 4.2 Linux system (tested on Red Hat 7.2,
    Red Hat 8.0 and SuSE 8.1)
  - point $DISPLAY to a Sun Solaris X server

Then press on the Sun X server a key that causes the keysym
"adiaeresis" to be sent to the above client. The client will
receive from the various key string lookup functions the
following strings (as displayed in a UTF-8 xterm, hex values
provided for clarity):

    XLookupString gives 2 bytes:  "ä" (c3 a4)
    XmbLookupString gives 4 bytes:  "ä" (c3 83 c2 a4)
    Xutf8LookupString gives 4 bytes:  "ä" (c3 83 c2 a4)

There are two problems, the first critical, the second dubious:

  a) CRITICAL: Both X{mb,utf8}LookupString output the same broken
     byte sequence that one gets if one sends the UTF-8 sequence for
     "ä" (c3 a4) erroneously through an ISO 8859-1 -> UTF-8 converter,
     i.e. c3 83 c2 a4.

  b) DUBIOUS: XLookupString is according to the manual supposed to *always*
     return ISO 8859-1 strings (just like STRING atoms always use ISO 8859-1),
     but here it actually returns text in the locale's multibyte encoding.
     (This is ok, if we can agree to change the libX11 C API
     definition accordingly, but it looks suspiciously like someone has
     been HACKing without respect for the API spec).

ii) If the same setup as in i) is used, but the locale of the client
replaced with an ISO 8859-1 locale (e.g., en_GB), then the result looks
correct (as displayed in an ISO 8859-1 xterm):

    XLookupString gives 1 bytes:  "ä" (e4)
    XmbLookupString gives 1 bytes:  "ä" (e4)
    Xutf8LookupString gives 2 bytes:  "ä" (c3 a4)

iii) Similarly, if the same setup as in i) is used, but the X server is
XFree86 4 with e.g.

  vendor string:    The XFree86 Project, Inc
  vendor release number:    40100000
  XFree86 version: 4.1.0

at least the CRITICAL problem is gone (as displayed in a UTF-8 xterm):

    XLookupString gives 2 bytes:  "ä" (c3 a4)
    XmbLookupString gives 2 bytes:  "ä" (c3 a4)
    Xutf8LookupString gives 2 bytes:  "ä" (c3 a4)

All the above output is from a patched version of xev that outputs
the strings from all three lookup functions.

This bug report distills my findings reported here earlier that
UTF-8 keyboard support fails from a Sun X server with the xterm and
emacs implementations in Red Hat 8.0.

Any ideas or reports of reproduceability would be welcome. This might
turn into a high priority problem, as breaking the X protocol this way
might be a major UTF-8 show stopper. Please have a look at it ...

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

_______________________________________________
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Reply via email to