Re: [I18n] urdu keymap
A brief purist's rant on keymaps and keysyms: X11 keymaps were originally merely meant to represent what symbols are printed/engraved on the keycaps of the keyboard hardware used (see Appendix A of the X11 protocol specification). Is the proposed Urdu keyboard layout already used by any keyboard manufacturer, or is this forseen in the immediate future? Can you send us a photo of a production prototype of that keyboard? I am asking, because I doubt somewhat that either XFree86 or X.Org are particularly interested in becoming a repository for invisible convenience keymaps that are not actually reflecting the symbols visible on real-world hardware. People can always configure such convenience additions privately using xmodmap. Maintaining a repository of personal convenience keymaps sounds like a very openended endeavor. Perhaps, there should be a requirement to provide a high-resolution photograph of a real keyboard along with each keymap that gets into the X11 distribution? That would also simplify tremendously the identification of which keymap belongs to which hardware, and it would help us a bit at least to weed out all the non-existing phantasy keyboards. Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n] urdu keymap
Kakilik Group wrote on 2004-09-01 15:42 UTC: There are no any hardware keyboard for Turkmen language(My language). Their alphabet is too young (1991-...). Does the keysymdef.h file that we are currently revising on http://www.cl.cam.ac.uk/~mgk25/ucs/keysymdef.h cover all the characters that you need? Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n] Re: Revision of Appendix A of the X11 Protocol Spec: KEYSYM Encoding
Alan Coopersmith wrote on 2004-08-16 16:35 UTC: Markus Kuhn wrote: I have substantially revised and updated the long neglected KEYSYM Encoding specification in Appendix A of the X11 Protocol Standard. The result, which I propose for inclusion into the next X.Org release, is on http://www.cl.cam.ac.uk/~mgk25/ucs/X11.keysyms.pdf While this looks good at a first glance, I think at this point it will have to wait for the release after X11R6.8 since there's simply not time for everyone to review it in the week remaining to the planned release of R6.8. OK, fair enough. I'll probably put some more work into it. In particular, I believe that the table of function keys should probably be turned into a format that makes it possible to have one or more descriptive sentences associated with each function key, to illuminate its source and purpose much better. The current simple names for each function key are often quite cryptic and not exactly very useful definitions of what these keysyms are good for and where they came from. After that, we can start looking at the various new multimedia/Internet keys on recent PC keyboards, as well as the archeology necessary to uncover what many of the more obscure older function keys were exactly meant to mean. (E.g., why were all the ISO 9995-7 shift keys called ISO ... latch, etc.?) But before I touch this again, do send me any comments that you have on the current version. Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n] Re: [Xorg] Revision of Appendix A of the X11 Protocol Spec: KEYSYM Encoding
Alex Deucher wrote on 2004-08-16 16:39 UTC: http://www.cl.cam.ac.uk/~mgk25/ucs/X11.keysyms.pdf http://www.cl.cam.ac.uk/~mgk25/ucs/X11.keysyms Please post this as a bug in xorg bugzilla so it can be tracked an properly integrated: http://bugs.freedesktop.org I've added it to http://freedesktop.org/bugzilla/show_bug.cgi?id=246 There is also a wiki page on the subject on http://freedesktop.org/XOrg/KeySyms Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n] Revision of keysymdef.h
Over the years, a number of problems have piled up with X11/keysymdef.h, which I have set out to sort out. These are in particular: - Xfree86 has added since X11R6.4 was released in 1999 and 2000 a large number of new macro definitions that are neither in line with the X11 protocols specification (Appendix A), nor have all of them been checked charefully against the Unicode standard. - The X11 standard lacks so far an official clarification of the relationship between keysyms and Unicode I have therefore reviewed, which keysyms are actually used in kbd mapping files and have substantially reworked and cleaned up the X11/ keysymdef.h file. The result is available on http://www.cl.cam.ac.uk/~mgk25/ucs/keysymdef.h The database file that I have prepared to control the changes made is on http://www.cl.cam.ac.uk/~mgk25/ucs/keysyms.txt and it contains further background information in the form of comments and a status column for each keysym. Quick summary of the changes: - I have preserved as newly added keysyms in the range reserved for the X11 standard only the following ones, as they are widely used now in XFree86 keyboard mapping files: 0x06ad Ukrainian_ghe_with_upturn 0x06bd Ukrainian_GHE_WITH_UPTURN 0xfe60 dead_belowdot 0xfe61 dead_hook 0xfe62 dead_horn These will have to be added to Appendix A of the protocols spec. - A large number of Armenian, Gregorian, Irish, Arabic, Cyrillic, Caucasus, and Vietnamese keysym macros were added by Pablo Saratxaga on 1999-06-06 and 2000-10-27. With a very small number of exotic exceptions, none of these keysyms are at present used in any XFree86 files. As many of them look useful, I have decided to remap them directly into the Unicode range by adding 0x0100 to their Unicode value. This way, these keysyms will not have to be added to the X11 standard, as they are implicitely defined by ISO 10646. - I have also moved the currency symbol in the 0x20xx range to the Unicode mapping range, with the exception of the EuroSign, which is the only one of these that is actually used in keyboard mapping files. - I have added as comments to each keysym for which there exists a direct or approximate Unicode equivalent the Unicode position and name of this character. - Various minor editorial fixes for better consistency, comments added that clarify the Unicode mapping rule. Unless someone shouts with an objection, I will submit these for inclusion into the X.Org and XFree86 trees soon. Proposed update to the X11 standard will follow shortly. Markus -- Markus Kuhn, Computer Laboratory, University of Cambridge http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n] Re: Emulation of Alt+Numpad+Digits behavior
Alan Coopersmith wrote on 2004-08-04 15:00 UTC: What I am trying to do is to emulate the MS-Windows behaviour which lets one enter arbitrary characters by using the Alt-Key while entering the character code on the numerical keypad. The MS-Windows behaviour is somewhat cumbersome for several reasons: - It uses decimal numbers, whereas many people seem to be far more familiar with hexadecimal numbers of well-known non-ASCII Unicode characters (20AC is the EURO, 2018/2018 are the left/right single quotation mark, 2013/2014 are en/em dash, 2212 is the minus sign, etc.), probably because the Unicode standard prints only the hex codes. - It requires two hands to be used simultaneously. - It requires the entry of a redundant leading zero, to avoid the MS-DOS CP437 backwards compatibility mode. There is a much neater alternative standardized in ISO/IEC 14755 - Input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input devices http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14755.pdf which fixes all these problems. If you do something in this area, please implement the ISO 14755 hex input method, and not the old MS-Windows one. (Or implement both together, if you really need MS-Windows compatibility here. They don't interfere with each other, because the ISO 14755 technique uses Ctrl-Shift to activate the hex-entry mode, while MS-Windows uses Alt.) Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n] Keysyms for Unicode values
Dave Williss wrote on 2004-04-14 22:39 UTC: I've noticed that for the most part, XKeysym values are just the Unicode value of the character. No, look closer, they are not. Unicode did not exist yet when keysyms were defined, therefore, keysyms are in a sense something similar to Unicode, but the code positions are competely different. However, there are obviously Unicode values which would overlap keysyms in the 0xFE00 to 0x range which would conflict. I've also noticed references to passing keysyms with the 0x0100 bit set to mean that the lower part of the keysym is a Unicode value. So my question is, for what Unicode values do I _need_ to use the 0x0100 bit and what ones should I not? I assume that if just set it for everything, old X clients would be confused by it and not know what to do. The relationship between the keysyms and Unicode is currently being addressed in a revision of the X11 protocol standard appendix that officially defines the keysyms. You can watch some of this process on the X.Org wiki page http://freedesktop.org/XOrg/KeySyms In a nutshell: for characters for which a keysym already exists (with a few exceptions where the meaning of the keysym is unclear), use the existing keysym value. For characters for which no keysym exists, add 0x0100 to the Unicode value and use that instead. An official round-trip compatible Unicode mapping table for the existing keysyms is under preparation and will be part of the next major X.Org release. http://www.cl.cam.ac.uk/~mgk25/unicode.html#x11 Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n] UTF-8 ICCCM properties
Juliusz wrote in http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/UTF8_STRING.text In the interest of interoperability, the semantics of the selection target TEXT are *not* changed; in particular, replying with a selection type of type UTF8_STRING to a request specifying TEXT is explicitly *not* allowed. Question: That was mostly in the context of how to handle xterm-style cutpaste selections. What about the window manager property types WM_CLIENT_MACHINE WM_ICON_NAME WM_NAME which are all specified in the ICCCM as type TEXT. If we exclude UTF8_STRING from being used where a type TEXT is specified, we will have no portable way of using Unicode in windows titles and icon names. Why can't we just jump into the cold water by adding UTF8_STRING to the list of encodings allowed to be used when then polymorphic type TEXT is used, with the simple restriction that STRING must be used whenever all the character of the string are contained in Latin-1? I understand that it may break temporarily a few things if the originator of a property uses UTF8_STRING and the recipient does not yet understand it. But in practice, isn't that just the same situation as we had when COMPOUND_TEXT was added in 1991 and C_STRING was added in 1993, two types that even today are still not supported by most applications. The only alternatives I could think of are all a bit ugly, such as adding WM_CLIENT_MACHINE_UTF8 WM_ICON_NAME_UTF8 WM_NAME_UTF8 all of type UTF8_STRING. Opinions? Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n] Unicode keysym questions
Alexander Krauss wrote on 2004-03-05 17:36 UTC: while trying to develop a keymap which includes mathematical symbols, I am wondering about the exact status of the UCS keysyms 0x0100 and above... Are these already standardized? Do any X servers except XF86 currently use them? The X.Org Foundation has given me access to their CVS just last week to ammend the X11 protocol specification and to make this convention official. I was on a phone conference with them last Monday and they all agreed that adding the 0x0100 convention to the standard would be most sensible. And... how exactly should they be interpreted by clients? Should there be any difference between for example eacute and U00E9? You will have to continue to use the existing keysyms if a character has one. The +0x0100 Unicode mapping is exclusively meant for adding any new keysyms for which there isn't already an existing code. This is to preserve backwards compatibility. Having said that, we may decide to retire a couple of the most obscure of the old keysyms for which simply the semantics has been lost in the mist of time, and where we are confident that +/- 0 people are actually using them. A more tricky question is what to do with the unauthorized addition of new Latin-8, Vietnamese, and Arabic keysyms a while ago by someone in XFree86 in the code space that used to be restricted for X.Org. I am mildly inclined to remove these and replace them with the equivalent +0x0100 Unicode mappings, in the interest of keeping mapping tables small, but I don't know how widely they have become used since XFree86 added them. Should a client interpret a U001B as an escape keystroke None of the values in the range 0x0100 to 0x01000100 will technically be assigned keysyms, as all ISO 8859-1 codes have already other code positions assigned. What your client decides to do if you receive one of these nevertheless (or any other random unassigned keysym value) will therefore be outside the X11 protocol specification. or are they all by definition characters and should be interpreted e.g. as the user wants this thing in his UTF-8 document... If you want to add such a function to your client, than that is up to you. However, a correctly configured X11 server should never send out a 0x011b keysym. Anything else would be a non-backwards compatible modification of the X11 protocol, that is likely to find resistance within X.Org. Or is this simply not strictly defined? We can define it now as strictly as we want and need, because the text passage that defines that officially will be written over the next few weeks. I also noticed that the Compose-Files of 4.3.0 in UTF-8 locales use the U keysyms even for characters that have old keysyms (all the accented latin-{12...} chars). I would argue that any U notation used in compose files will have to go through a special unicode2keysym conversion function that uses a mapping table. You cannot simply add 0x0100 to *any* Unicode character to get its keysym. If XFree86 doesn't do that conversion correctly at the moment, please file this into the xfree86 bugzilla such that it will not get lost. Check, what keysym values these compose files produce on the wire, which is all that counts in the end. Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n] X.Org (Foundation) waking up / UTF8_STRING / Keysyms
Rumour has it that X.Org, an organization we long believed to be dead, has woken up again, changed its name to X.Org Foundation, and finally wants to update the X11 spec: http://www.x.org/XOrg_Foundation.html UTF8_STRING has already made it onto the agenda: http://www.opengroup.org/sophocles/show_mail.tpl?source=Llistname=xorg_archid=16 Keysyms probably will be next. More on the xorg_arch mailing list: http://www.x.org/XOrg_Foundation_Join_OpenLists.html Markus -- Markus Kuhn, Computer Laboratory, University of Cambridge http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n] Unicode
Bharathi S wrote on 2003-01-14 12:17 UTC: How to send a 16Bit Unicode value to a Application ? If I use the XmodMap, then Which Xlib function is responsible for taking the Unicode Value frm XModMap ? Make sure you are in a UTF-8 locale and use the keysym value 0x0100abcd with xmodmap, in order to represent the Unicode character U+abcd. Also read: http://www.cl.cam.ac.uk/~mgk25/unicode.html#x11 Instead of xmodmap, also consider to use xkbcomp. Markus -- Markus Kuhn, Computer Lab, Univ of Cambridge, GB http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]per mille symbol, ligature symbols?
Andreas Tobler wrote on 2003-01-03 12:36 UTC: e.g. key TLDE {[0x0100+0x2030]} 0x01002030 Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Re: Decimal key on European keyboard layouts
Dr Andrew C Aitchison wrote on 2002-12-13 10:13 UTC: Looking at the unicode charts (especially the character name index http://www.unicode.org/charts/charindex.html ) I see that ASCII dot 0x2E has become Unicode 0x002E Decimal Point and ASCII comma 0x2C has become 0x002C decimal separator. http://www.unicode.org/charts/PDF/U.pdf renders these in the English way, not the continental one you desire. U+002E = FULL STOP U+002C = COMMA There is no question at all in Unicode about how these two characters have to be rendered. Their rendering is locale independent. There was discussion long ago about adding a decimal separator character to Unicode, but the idea was considered unnecessary and confusing and therefore dropped. ISO has in the past suggested to use a tiny downwards-facing triangle that is around the size of a full stop or comma as a culturally neutral glyph for a decimal separator key, but that too has not caught on. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: Solution. was:[I18n]XFree86 Xutf8LookupString BUG with Solarix X server.
Thanks for the fast bug fix. I hope, RedHat/Suse/etc. will fix this in their RPMs *very soon*, otherwise the UTF-8 locales remain completely unuseable for everyone on an X server without XKBD (e.g., Solaris). Ivan Pascal wrote on 2002-11-29 12:46 UTC: But under the UTF-8 locale XLookupString returns two (and more) chars for non-ascii keysyms. And X{mb|wc|utf8}LookupString mistakely converted them to UTF-8 one more time. Wasn't Xutf8LookupString supposed to be guaranteed to be locale encoding *independent*? So why does it have to be implemented on top of the (apparently) locale-dependent XLookupString? Sounds not entirely kosher ... the X{mb|wc|utf8}LookupString family has a checking which discards non-ascii char outputed by XLookupString if it is only one Why is it necessary to distinguish between ASCII and non-ASCII characters? xc/lib/X11/XKB.c Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]XFree86 Xutf8LookupString BUG with Solarix X server
Juliusz Chroboczek wrote on 2002-11-28 11:36 UTC: 'A knot!' said Alice. 'Oh, do let me help to undo it!' Could you please put an xscope dump on the web somewhere ? Thanks! I didn't know about xscope. The requested dump is now on http://www.cl.cam.ac.uk/~mgk25/ucs/xev-adiaeresis-utf8.txt including a full description of what I did to get the log. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]xev XIM patch
In order to hunt down a very odd problem with the XmbLookupString and Xutf8LookupString functions, I have extended the good old xev command to print the output of these two functions as well, in addition to that of XLookupString. In order to use XmbLookupString or Xutf8LookupString, one needs to provide an X Input Context (XIC), which one can get after opening an X Input Method (XIM). Not being an XIM guru, I have copied and simplified the minimally necessary code to get it running from xterm-170's charproc.c:VTInitI18N. Could someone who is at least slightly more familiar than me in XIM matters have a brief look at the attached patch (especially the last hunk), before I send it to the CVS maintainers? http://devel:passwd@www.xfree86.org/devel/cgi-bin/cvsweb.cgi/~checkout~/xc/programs/xev/xev.c?content-type=text/plain Thanks! Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ --- xev.c.orig Tue Nov 26 02:10:51 2002 +++ xev.c Wed Nov 27 16:15:26 2002 @@ -30,117 +30,152 @@ */ /* $XFree86: xc/programs/xev/xev.c,v 1.6 2002/11/26 02:10:51 dawes Exp $ */ /* * Author: Jim Fulton, MIT X Consortium */ #include stdio.h #include stdlib.h #include ctype.h #include X11/Xlocale.h #include X11/Xos.h #include X11/Xlib.h #include X11/Xutil.h #include X11/Xproto.h #define INNER_WINDOW_WIDTH 50 #define INNER_WINDOW_HEIGHT 50 #define INNER_WINDOW_BORDER 4 #define INNER_WINDOW_X 10 #define INNER_WINDOW_Y 10 #define OUTER_WINDOW_MIN_WIDTH (INNER_WINDOW_WIDTH + \ 2 * (INNER_WINDOW_BORDER + INNER_WINDOW_X)) #define OUTER_WINDOW_MIN_HEIGHT (INNER_WINDOW_HEIGHT + \ 2 * (INNER_WINDOW_BORDER + INNER_WINDOW_Y)) #define OUTER_WINDOW_DEF_WIDTH (OUTER_WINDOW_MIN_WIDTH + 100) #define OUTER_WINDOW_DEF_HEIGHT (OUTER_WINDOW_MIN_HEIGHT + 100) #define OUTER_WINDOW_DEF_X 100 #define OUTER_WINDOW_DEF_Y 100 typedef unsigned long Pixel; const char *Yes = YES; const char *No = NO; const char *Unknown = unknown; const char *ProgramName; Display *dpy; int screen; +XIM xim = (XIM) NULL; +XIC xic = (XIC) NULL; void prologue (eventp, event_name) XEvent *eventp; char *event_name; { XAnyEvent *e = (XAnyEvent *) eventp; printf (\n%s event, serial %ld, synthetic %s, window 0x%lx,\n, event_name, e-serial, e-send_event ? Yes : No, e-window); } +void +hexdump (s, len) +char *s; +{ +for (; len 0; len--, s++) +printf(%02x%s, (unsigned char) *s, len 1 ? : ); +} void do_KeyPress (eventp) XEvent *eventp; { XKeyEvent *e = (XKeyEvent *) eventp; KeySym ks; char *ksname; int nbytes; char str[256+1]; nbytes = XLookupString (e, str, 256, ks, NULL); if (ks == NoSymbol) ksname = NoSymbol; else if (!(ksname = XKeysymToString (ks))) ksname = (no name); printf (root 0x%lx, subw 0x%lx, time %lu, (%d,%d), root:(%d,%d),\n, e-root, e-subwindow, e-time, e-x, e-y, e-x_root, e-y_root); printf (state 0x%x, keycode %u (keysym 0x%lx, %s), same_screen %s,\n, e-state, e-keycode, (unsigned long) ks, ksname, e-same_screen ? Yes : No); if (nbytes 0) nbytes = 0; if (nbytes 256) nbytes = 256; str[nbytes] = '\0'; -printf (XLookupString gives %d bytes: \%s\\n, nbytes, str); +printf (XLookupString gives %d bytes: \%s\ (, nbytes, str); +hexdump(str, nbytes); +printf ()\n); +if (e-type == KeyPress) { +if (xic) { +nbytes = XmbLookupString(xic, e, str, 256, ks, NULL); + if (nbytes 0) nbytes = 0; + if (nbytes 256) nbytes = 256; + str[nbytes] = '\0'; + printf (XmbLookupString gives %d bytes: \%s\ (, + nbytes, str); + hexdump(str, nbytes); + printf ()\n); + } +#ifdef X_HAVE_UTF8_STRING + if (xic) { + nbytes = Xutf8LookupString(xic, e, str, 256, ks, NULL); + if (nbytes 0) nbytes = 0; + if (nbytes 256) nbytes = 256; + str[nbytes] = '\0'; + printf (Xutf8LookupString gives %d bytes: \%s\ (, + nbytes, str); + hexdump(str, nbytes); + printf ()\n); + } +} +#endif } void do_KeyRelease (eventp) XEvent *eventp; { do_KeyPress (eventp); /* since it has the same info */ } void do_ButtonPress (eventp) XEvent *eventp; { XButtonEvent *e = (XButtonEvent *) eventp; printf (root 0x%lx, subw 0x%lx, time %lu, (%d,%d), root:(%d,%d),\n, e-root, e-subwindow, e-time, e-x, e-y, e-x_root, e-y_root); printf (state 0x%x, button %u, same_screen %s\n, e-state, e-button, e-same_screen ? Yes : No); } void do_ButtonRelease (eventp) XEvent *eventp; {
[I18n]XFree86 Xutf8LookupString BUG with Solarix X server
I think I have run into a serious bug with XFree86's Xutf8LookupString implementation. It occurs when the client runs under XFree86 4.[12], but the X server is for example Solaris 5.8 vendor string:Sun Microsystems, Inc. vendor release number:6410 (also occurs on Solaris 5.7). It does but not occur when the X server is XFree86 4.1. i) To reproduce the problem, start an X client in the following environment: - use a UTF-8 locale (e.g., LC_CTYPE=en_GB.UTF-8) - use an XFree86 4.1 or 4.2 Linux system (tested on Red Hat 7.2, Red Hat 8.0 and SuSE 8.1) - point $DISPLAY to a Sun Solaris X server Then press on the Sun X server a key that causes the keysym adiaeresis to be sent to the above client. The client will receive from the various key string lookup functions the following strings (as displayed in a UTF-8 xterm, hex values provided for clarity): XLookupString gives 2 bytes: ä (c3 a4) XmbLookupString gives 4 bytes: ä (c3 83 c2 a4) Xutf8LookupString gives 4 bytes: ä (c3 83 c2 a4) There are two problems, the first critical, the second dubious: a) CRITICAL: Both X{mb,utf8}LookupString output the same broken byte sequence that one gets if one sends the UTF-8 sequence for ä (c3 a4) erroneously through an ISO 8859-1 - UTF-8 converter, i.e. c3 83 c2 a4. b) DUBIOUS: XLookupString is according to the manual supposed to *always* return ISO 8859-1 strings (just like STRING atoms always use ISO 8859-1), but here it actually returns text in the locale's multibyte encoding. (This is ok, if we can agree to change the libX11 C API definition accordingly, but it looks suspiciously like someone has been HACKing without respect for the API spec). ii) If the same setup as in i) is used, but the locale of the client replaced with an ISO 8859-1 locale (e.g., en_GB), then the result looks correct (as displayed in an ISO 8859-1 xterm): XLookupString gives 1 bytes: ä (e4) XmbLookupString gives 1 bytes: ä (e4) Xutf8LookupString gives 2 bytes: ä (c3 a4) iii) Similarly, if the same setup as in i) is used, but the X server is XFree86 4 with e.g. vendor string:The XFree86 Project, Inc vendor release number:4010 XFree86 version: 4.1.0 at least the CRITICAL problem is gone (as displayed in a UTF-8 xterm): XLookupString gives 2 bytes: ä (c3 a4) XmbLookupString gives 2 bytes: ä (c3 a4) Xutf8LookupString gives 2 bytes: ä (c3 a4) All the above output is from a patched version of xev that outputs the strings from all three lookup functions. This bug report distills my findings reported here earlier that UTF-8 keyboard support fails from a Sun X server with the xterm and emacs implementations in Red Hat 8.0. Any ideas or reports of reproduceability would be welcome. This might turn into a high priority problem, as breaking the X protocol this way might be a major UTF-8 show stopper. Please have a look at it ... Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Unicode / 16-Bit
Is the X (4.x) supporting Unicode/16 Bit encoding ? How ? Yes, to some degree. To get started, try: http://www.cl.cam.ac.uk/~mgk25/unicode.html Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Syriac keyboard layout
Emil Soleyman-Zomalan wrote on 2002-11-14 16:00 UTC: Just for my own knowledge, what would be the disadvantage of creating a new set of keysyms for Syriac as already has been done for several other languages? There is nothing wrong in principle with adding new keysyms, however the integer number associated with new keysyms for which there is an equivalent Unicode U-00xx character MUST be 0x01xx, because we do not want to let the keysym table grow needlessly (to more than 6 entries). However, since keysyms are never visible to anyone but people who write keyboard layout tables, it is a senseless exercise to add new ones, so we stopped it entirely and now consider keysyms merely to be a frozen pre-Unicode era artifact. Keysyms will only be added for keyboard keys, for which it is unlikely that there will ever be a good equivalent in Unicode. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Locales and charset encodings
Juliusz Chroboczek wrote on 2002-11-12 16:49 UTC: Use nl_langinfo(3), but it's an SUSv2 interface (i.e. it's not in POSIX) POSIX.{1,2}:2001 and SUSv4 are the same thing now. So nl_langinfo(CODESET) is now actually officially part of the holy dogma of the church of POSIX. Halleluhia. Unfortunately, the return values of nl_langinfo(CODESET) are not standardized. Fortunately, the systems out there that implement nl_langinfo(CODESET) can be counted with the fingers of one hand and their outputs are easily normalized by a little routine: http://www.cl.cam.ac.uk/~mgk25/ucs/norm_charmap.c On legacy systems without nl_langinfo(CODESET), you can get a decent educated guess from this routine: http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c More on this issue is on: http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate DW Would there be a way to _force_ the encoding to, say UTF-8? Unfortunately not with any formally standardized functions. The inertia of standardization bodies makes it currently not feasible to give up the notion that UTF-8 is a bit more useful than yet-another multibyte-encoding. Until then, XFree86 has added Xutf8*() functions for that exact purpose, and we strongly hope that they will catch on with other implementors as well. Though not all have seen the light yet. Don't expect any progress with the X11 spec soon. What was supposed to be the X11 standards body (X.Org) has been in a coma for several years and is awaiting its death certificate. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior
I had originally argued strongly in favour of a BACKSPACE display semantics that removes the character left of the cursor (let's call this character L), and then moves the cursor wcwidth(L) character cells to the left. This is by far the most sensible solution, because this way, if you echo the keyboard output back into the display, pressing backspace will give you exactly the same effect as you would expect in an editor. The result would have been that in order for backspace to work correctly with double-width (and combining) characters, no changes will have to be made to the tty cooked mode editor in the kernel that you get when you type text into stdin of any Unix application. Unfortunately, existing CJK implementation practice has messed up this and has used backspace with a move-cursor-left-one-cell display semantics. An argument that we have to stick in UTF-8 modes compatible with this highly unfortunate and inconvenient CJK implementation practice has been made, but I am still not convinced that a) there really is such a backwards compatibility requirement b) that the 1-cell-left semantics of backspace has any advantage over the erase-1-character-left semantics whatsoever I would say at least that the jury of what a backspace sent to a UTF-8 terminal means is still out, and I'd advise authors of editors not to send any backspace 0x08 characters to terminals. Please use absolute or relative cursor positioning command sequences, which have unambiguous semantics. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Deadkeys for ntilde and accents on a US keyboard?
Oscar A. Valdez wrote on 2002-10-01 22:37 UTC: How do I configure XFree86 v. 4.1.0 to get ñ,Ñ,á,é,í,ó and ú on a US-layout keyboard? If it is only for personal single-user use, then .Xmodmap is still the simplest solution. I routinely type German and English text on a UK keyboard with the following .Xmodmap file (which also disables the annoying caps-lock key): ! to get capslock back: xmodmap -e 'add Lock = Caps_Lock' clear lock keysym a = a NoSymbol adiaeresis NoSymbol keysym o = o NoSymbol odiaeresis NoSymbol keysym u = u NoSymbol udiaeresis NoSymbol keysym s = s NoSymbol ssharp NoSymbol keysym p = p NoSymbol sectionGreek_pi keysym d = d NoSymbol degree NoSymbol keysym e = e NoSymbol EuroSign NoSymbol keysym i = i NoSymbol idiaeresis NoSymbol keysym m = m NoSymbol emdash mu keysym n = n NoSymbol endash NoSymbol keysym space = space NoSymbol nobreakspace NoSymbol keycode 34 = bracketleft braceleft leftsinglequotemark leftdoublequotemark keycode 35 = bracketright braceright rightsinglequotemark rightdoublequotemark keysym minus = minus underscore 0x01002212 NoSymbol Just add the command xmodmap .Xmodmap to your .xsession file. man xmodmap man X less /usr/include/X11/keysymdef.h Note that practically UK PC keyboards have an AltGr key. I understand, that in the US, unfortunately not all PC keyboards have this key available for the Mode_switch keysym, so you have to define some other key (Alt_R, Ctrl_R, F1, etc.) to be the Mode_Switch key. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]unicode
Viveka Nathan K wrote on 2002-09-30 10:56 UTC: I wish to use the unicode encoding. How can I know, which applications are supporting the unicode. What should I need to do, to make an application to support unicode ? Read http://www.cl.cam.ac.uk/~mgk25/unicode.html to get started. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Default fonts for xterm
Tomohiro KUBOTA wrote on 2002-08-26 08:48 UTC: The first is that simply using *-iso10646-1 fonts as defaults. This could already be achieved by changing in /usr/lib/X11/fonts/misc/fonts.alias the line fixed-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso8859-1 to fixed-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1 This was my goal many years ago, when I started with the misc-fixed extension project and had no idea, how awful the X font system really is. Unfortunately, merely changing fixed turned out to be not feasible because a) The X protocol is very inefficient in handling sparse 16-bit encodings. b) Some legacy applications were unfortunately hardwired to assume that fixed is an 8-bit font. I think a) is the more critical reason, and a 16-bit font should be used by xterm by default only, if a better font API such as Xft is used instead of XLoadQueryFont(). The second solution is to implement UTF-8-specific font configuration items, like uFont, uFont2, uWideFont4, and so on. This is a very good idea, and I remember that it was discussed and welcome here before. I also had thought that someone had already written a patch to do this, or had planed to do so, but apparently it never made it into xterm. Using an independent set of font resource entries in UTF-8 mode seems the right thing to do to me. I think the second one is better, though the first one is simpler and not very harmful. However, I don't have enough time to work on these solutions. I agree that the second solution is what should be done (but I don't have the time to submit a patch myself either). Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Displaying chinese in an xterm
Jungshik Shin wrote on 2002-08-06 15:05 UTC: You can use one of 18pixel iso10646-1 bitmap fonts included in XF86 4.x with more CJK characters than 13pixel font: -misc-fixed-medium-r-normal-ko-18-120-100-100-c-180-iso10646-1 -misc-fixed-medium-r-normal-ja-18-120-100-100-c-180-iso10646-1 However, I believe neither of them has the full coverage of GB 2312. From README in http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz 12x13ja.bdf: Covers all CP1252, CP437, JIS X 0208, and Hangul characters, and a few more. This font is primarily intended to provide Japanese full-width Hiragana, Katakana, and Kanji for applications that take the remaining (halfwidth) characters from 6x13.bdf. Might in the future be extended to cover TARGET2 if there is sufficient interest in using it as a stand-alone fixed-width font without 6x13. The Greek lowercase characters in it are still a bit ugly and will need some work. 18x18ja.bdf: Covers all JIS X 0208, JIS X 0212, GB 2312-80, KS X 1001:1992, ISO 8859-1,2,3,4,5,7,9,10,15, CP437, CP850 and CP1252 characters, plus a few more, where priority was given to Japanese han style variants. This font should have everything needed to cover the full ISO-2022-JP-2 (RFC 1554) repertoire. This font is primarily intended to provide Japanese full-width Hiragana, Katakana, and Kanji for applications that take the remaining (halfwidth) characters from 9x18.bdf. 18x18ko.bdf: Covers the same repertoire as 18x18ja plus full coverage of all Hangul syllables and priority was given to Hanja glyphs in the unified CJK area as they are used for writing Korean. What admittedly is still missing is an 18x18zh.bdf font that gives priority to Chinese style variants, but GB 2312 is certainly covered by both 18x18 fonts. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Re: Welsh support needed for XFree86
Attached is an old email that represents the most authoritative information that I have on the diacritic characters used in dictionaries of the Welsh language. Hope this helped ... Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ --- Forwarded Message Date: Tue, 18 Aug 1998 17:10:15 +0100 To: [EMAIL PROTECTED] From: Andrew Hawke [EMAIL PROTECTED] Subject: Welsh character sets (LONG MESSAGE) Markus, you e-mailed [EMAIL PROTECTED] regarding the frequency of certain Welsh letter+accent combinations. He submitted your query to the WELSH-L discussion list. I have replied to the list, but I also felt that I should take the liberty of contacting you directly, as this is something I have strong views on. Some background: I am Assistant Editor and Systems Manager for the University of Wales Dictionary of Welsh, the standard scholarly dictionary of the language. I also chair the Celtic Texts Specialist Group of the International Association of Literary and Linguistic Computing. The University of Wales has an orthography committee which publishes guidelines for Welsh spelling which are accepted by all Welsh writers and publishers. These notes are based on those guidelines. Welsh is now legally one of the two official languages of Wales, on an equal legal footing with English. The government has established a body called the Welsh Language Board to promote the use of Welsh. The language is now taught in every school in Wales (and is the main language of instruction in many of them). Some 600 books and many magazines and newspapers are published annually. The use of the language in all spheres, and increasingly in business, public life, the administration of justice, education, government and the media (there is a Welsh-language TV channel) is growing rapidly. Welsh is spoken by approximately 500,000 people in Wales, and by several hundred thousand outside Wales. The number of speakers showed a slight increase at the last census, after nearly a century of continuous decline. The availabilty of character sets to represent the language is absolutely essential, and such character sets should be as complete as possible. In the past, the lack of appropriate character sets has been a considerable deterrant to using the language in print and electronically. I would urge you to bear this in mind when considering the following. Johann van Wingen (of the Netherlands WG on ISO 10,460) pushed hard for the inclusion of all the possible Welsh letter/accent combinations, which was eventually accepted by the ISO and subsequently Unicode. Microsoft has also committed to including the 13 additional characters in its OpenType fonts. I have communicated extensively on this point with John Hudson of Tiro Typeworks in Vancouver (www.tiro.com) who has been working on OpenType fonts for Microsoft and for academic purposes. I reproduce below my main comments to him which may be of assistance to you. = COPIED MATERIAL FOLLOWS Modern usage of the diacritics in Welsh is as follows: (All diacritics are shown following the vowel which is accented, e.g. a^ represents a lower-case a with a circumflex accent.) Welsh requires the circumflex (^), acute ('), grave (`), and diaeresis () on all vowels, i.e. a, e, i, o, u w, y (w being used in Welsh both as a vowel and a semi-vowel). The incidence of these combinations varies very widely. All diacritics (accents) in Modern standard Welsh are compulsory and are used to differentiate between different pronunciations of otherwise similar- or identical-looking words, either in terms of length (long vs. short) or stress. The stress accent in Welsh always falls on the penultimate syllable, unless an accent (or a hyphen or an inserted h) indicates otherwise. BECAUSE OF THIS, ALL THE ACCENTED WELSH CHARACTERS ARE REQUIRED, IN BOTH UPPER- AND LOWER-CASE FORMS. The circumflex is used solely to indicate that a vowel is long in a context in which it would normally be expected to be short, e.g.: gwa^n `he pierces' vs. gwan `weak' gwe^n `a smile' vs. gwen `white (fem.)' pi^n `pine (wood, tree)' vs.pi`n `a pin' co^r `a choir' vs. cor `a dwarf' bu^m `I was (perfect)' vs. bum `five (mutated)' tw^r `a tower' vs. twr `a group' y^m `we are'vs. ym `in (before m)' The diaeresis is used to separate vowels, as in English: prosaig `prosaic', crewr `creator', copio `to copy', troedigaeth `conversion', duwch `blackness', Rebacayddiaeth `Rebaccaism', cywres `concubine' The acute accent is used to indicate unexpected stress (i.e. not on the penultimate): casa'u `to hate', case't `cassette', ricri'wt `a recruit' paraso'l `a parasol', rebu'wc `a rebuke', caridy'ms `riff-raff', gw'raidd `manly'
[I18n]Re: POSIX:2001 freely available, STIX Fonts
Keith Packard wrote on 2002-07-10 01:09 UTC: AFAIK, SUS and POSIX say that it's implementation-dependent. Too bad the POSIX spec is closed so I can't check. For all of you who haven't heard yet, SUS3 and POSIX:2001 are now the same thing and are freely available online on http://www.unix-systems.org/version3/ Bookmark now. Also interesting: the folks who brought you the free Type1 versions of Computer Modern have agreed to put together a comprehensive free high-quality Unicode font for scientific publishing: http://www.stixfonts.org/ Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [Fonts]Re: [I18n]language tags in fontconfig
Keith Packard wrote on 2002-07-06 10:34 UTC: I got the European coverage information from http://www.evertype.com/alphabets/ I don't know why all of the latin languages include @ and ', it's probably just a mistake; they're easily removed. Actually, thanks to URLs and email addresses, which can and do contain *all* ASCII characters, in practice full 7-bit ASCII coverage is required for writing *any* contemporary language. Only Romans, Egyptians, Babylonians, Etruscans, etc. still get away without email ... :) In addition, UCS specifically states that no UCS subset should exclude the Basic Latin range of U0020-U007e. Therefore, the ASCII coverage of Michael Everson's alphabet list should more be seen as an academic curiosity, and not as relevant to implementations. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Please do not use en_US.UTF-8 outside the US
As we are talking about en_US.UTF-8: General warning: Please do not use the locale name en_US.UTF-8 anywhere outside North America. Some older Solaris documentation suggested that this is the only UTF-8 locale you'll ever need, as locales don't change much sensible beyond the encoding anyway. This is not the case any more today! An increasing number of programs of US origin finally start to abandon the annoying old habit of assuming Legal paper and non-metric units as default conventions everywhere, requiring 95% of the world population to figure out how to reconfigure to the standard conventions. More recent software releases instead determine the default setting for conventions such as paper format and units of measurement with code similar to the following (feel free to copy it into your software as well): #include stdio.h #include stdlib.h #include string.h /* LC_PAPER and LC_MEASUREMENT were introduced in ISO/IEC TR 14652 */ int main() { char *units = mm; char *paper = A4; char *s; if (((s = getenv(LC_ALL))*s) || ((s = getenv(LC_PAPER)) *s) || ((s = getenv(LANG)) *s)) if (strstr(s, _US) || strstr(s, _CA)) paper = Letter; if (((s = getenv(LC_ALL))*s) || ((s = getenv(LC_MEASUREMENT)) *s) || ((s = getenv(LANG)) *s)) if (strstr(s, _US)) units = inches; printf(Paper: %s\nUnits: %s\n, paper, units); return 0; } This leads to portable and agreeable default settings, using the standard values UNLESS you are in a locale that explicitely says that you are in North America. I think that's a very good implementation practice, but it requires that if you explain to an international audience how to activate UTF-8 locales, you should better use a non-US/ CA locale. (en_GB.UTF-8 for instance seems like an excellent choice ... :) Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]So, will Bidi+Xterm happen ?
Nadim Shaikli wrote on 2002-02-28 23:48 UTC: What are your comments on mlterm, patch27, biditext (have you used 'em) ? Can you send me a compact exact specification of the exact bidi semantics of these implementations? I haven't seen one yet and I don't have the time to reverse engineer these. If cat works, this means nothing, as this just tests what the terminal does when you send paragraphs with CRLF terminated lines to it and the cursor is at the bottom of the screen. This tests only the most trivial case of bidi functionality. I am far more worried about the sort of ESC sequences that vim, readline, and ncurses use to talk to the terminal and how they interact with the bidi. What does it mean to delete a character in bidi mode (in which direction does it move and what happens if it hits a bidi boundary), etc. Will it work with the tty cooked mode? Forget about cat, think about editors, starting from the most primitive ones. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]BiDi rant
Mark Leisher wrote on 2002-02-13 00:29 UTC: I respect Markus too much to think this was anything more than a subconscious plea for simplicity and symmetry, born of irritation with messy reality. Sort of absentmindedly muttering out loud when someone is nearby. What I primarily wanted to remind people of is that bidi and VT100-style terminal semantics do not mix well at all and that just repeatedly reminding us of the user requirements/wishes/dreams in that area will not change that it is a fundamentally tedious and difficult subject. There are perhaps good reasons why ECMA-48 hasn't yet been fully implemented and I perceive at least some agreement that xterm is probably not the right level at which to implement bidi, in particular not for editing and cursor control. There are good reasons why there is no long-standing successful commercial tradition of Arabic/Hebrew VT100-style terminals. I think, users of Arabic, Hebrew, (perhaps also Indic) should best focus on non-terminal GUI applications, where bibi can be easily performed in one go at the paragraph level, and simply not expect too much from the character-cell terminal environment the exact same level of functionality and convenience that LTR users enjoy. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]BiDi rant
Tzafrir Cohen wrote on 2002-02-12 18:46 UTC: I think it might be a good idea to really keep bidi completely out of xterm. If people want to play around with bidi terminal semantics, then I would suggest that they build a filter that can be plugged in between the LTR terminal and the application, just like Juliusz' luit does already for non-UTF-8 encodings. How can I disable bidi support at run-time with such a model? The bidi filter would intercept both the character streams to and from the terminal emulator. It therefore can intercept not only ESC sequences from the applications but also special hot-key keystroke sequences from the terminal that can be used to change its parameters or bring up a little menu. GNU screen does all this already. It was written primarily to a) allow you to multiplex several terminal connections via a single physical or emulated terminal b) add support for cutpaste to physical or emulated terminals that don't have this facility c) allow to detach a virtual terminal and move it onto another physical terminal without closing the session d) provide a high-quality emulation of a VT100 terminal on a low-quality termcap terminal e) perform character encoding conversion and/or transliteration (luit does that part as well) and I guess, bidi things could be added to it as well. I got my first degree at the University of Erlangen, where lots of terminal users at the CS department used screen (because VT100-compatible terminals were available ubiquitously in 1990, but good high-res monitors for X11 were still relatively rare in undergraduate teaching rooms). I am repeatedly surprised how little this marvelous terminal emulator tool is known elsewhere today. ftp://ftp.uni-erlangen.de/pub/utilities/screen/ ftp://ftp.uni-erlangen.de/pub/utilities/screen/private/screen-3.9.9beta1.tar.gz The advantage of the filter approach is that it works with pretty much any terminal emulator, not only with xterm. You can also decide in remote communication scenarios, whether to install the filter on the host or on the terminal emulator machine. If you want to play around with the idea, look at either screen or luit as freely available starting points. http://www.pps.jussieu.fr/~jch/software/luit/ What you cannot automate, however, is the people reading those texts. It is also impossible to reprint all the existing texts (consider that the bible will have to written in original hebrew, for instance). As I said: you are not the first to suggest this. It's certainly difficult, but not impossible. Turkey is the major successful case I know of, where a script reform has succeeded. It was driven by a major political move to make the country overall more secular and compatible with Europe. German's abandoning of fraktur probabaly doesn't count, as that's really just a different font style, not a fundamentally differently structed alphabet or reading direction. Does anyone know of any other examples or successful major script reforms (apart from the semi-successful Soviet attempts to force all their republics to switch to cyrillic)? Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Re: XLFD subsetting and bdftruncate
Moe Elzubeir wrote on 2002-01-22 17:08 UTC: The subsetting system is in place already, so now what? I still have not fully understood, what exactly is in place and how well does it work in X11R6. For example xfd -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-*-75-75-C-*-ISO10646-1[0_0xff]' works (and returns just the Latin-1 part of the Unicode font) but then xfd -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1[0_0xff]' does *NOT* work and returns (when no bdftruncate is used) the full 700 kilobyte large XFontStruct that we fear so much. What is going on here? I suspect nothing has actually changed since I did my tests a few years ago, it is just that Juliusz's example XLFDs contained wildcards at the right place, whereas I always used the full XLFD as it stands in the BDF file. What difference does that make in the font mechanics of the X server? This is getting stranger and stranger and before we start to rely on the subsetting, I strongly suggest that someone looks into what exactly of it works to properly get it documented first. Or eliminate what might just be a bug, namely that subsetting only works with enough wildcards. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Re: [Devel] Re: [Fonts]Another approach to text in X
Alexander Gelfenbain wrote on 2002-01-17 19:59 UTC: I can confirm that the license ST will be released with is BSD+ which is standard BSD with the following clause: * You acknowledge that this software is not designed, licensed or indended * for use in the design, construction, operation or maintenance of any * nuclear facility. Just curious: Was this a legal or political requirement? I'm not sure, high energy physics on the other side of the street from here will in practice be aware of such a strange restriction, once they get this package via the next SuSE or Solaris update on their office machines. They are in the profession of doing rather cruel things to atomic nuclei and design and run facilities to do so. We are working on publicly releasing the docs and placing the source code on sourceforge or some other public CVS server. Please stay tuned. I'm very much looking forward to seeing it. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]U+3000 limit
Tomohiro KUBOTA wrote on 2002-01-10 10:58 UTC: At Wed, 09 Jan 2002 18:07:03 +, Markus Kuhn wrote: .. unless an explicit subrange specification is present, such that people have to write *-iso10646-1[0_0x] if they are sure that they want to have the full font. In other words, allow the specification of a default subrange for sparsely populated ISO10646-1 fonts (e.g., those with more than 90% of their characters below 0x3000). How such range limitation will be used? By knowledged end-users who knows (s)he doesn't need U+3000 characters? Or, automatically set based on locale? Or, as a hard-coded default font by foolish software developers who assume computers are used only by U+3000 people? You could for example simply cut the Unicode character space into 16 intervals 0x0XXX, 0x1XXX, etc. and open each interval as soon as you encounter a glyph from it. Many widget sets (e.g., Tk) open fonts only when needed, and that extends naturally to font subranges. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Re: U+3000 limit
Moe Elzubeir wrote on 2002-01-08 00:12 UTC: I have been looking into the U+3000 limit and how the 10x20 font is being truncated to save memory space. This 'truncation' of the 10x20 for 'optimization' is seriously hampering our efforts to bring Arabic support on platforms where XFree86 runs. I have updated http://www.cl.cam.ac.uk/~mgk25/unicode.html#xfontstruct to tell the full story on this subject. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]U+3000 limit
Juliusz Chroboczek wrote on 2002-01-08 14:16 UTC: Font subsetting is fully implemented in the BDF, PCF, Type 1, Speedo and freetype backends. I haven't checked the SNF or X-TT backends. Try xfd -fn '-misc-fixed-medium-r-semicondensed--13-*-75-75-c-*-iso8859-1[65_90]' Very nice, I hadn't seen that! Works fine for my XFree86 4.0.3 installation here. Since which release exactly did this work? So I think, we can now drop bdftruncate from the ucs-fonts installation procedure, as people merely have to add [0_0x31ff] to an XLFD to achieve the same effect. Any opinions? Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Re: RENDER performance
Keith Packard wrote on 2001-12-28 19:54 UTC: I should have monochrome text running in a week or so to give people a chance to experiment with performance over links of various sorts. When I've done this in other environments, I've found performance to be acceptable down to 2B ISDN speeds; others may have different opinions. I assume that is with some contemporary pixel size r. Unless you use some good compression technique, performance will be proportional to r^{-2}. With some good textual image compression systems (PNG, G4FAX, JBIG, etc.) used on the bitmaps, performance might become proportional to around r^{-1.3}. Pixel sizes for color CRTs have stabilized now at around 0.22-0.25 mm, as smaller aperture masks are not feasible. But who knows what's coming next? Pixel sizes down to 0.05-0.10 mm, as we have already with laser printers, would certainly be desireable for e-book applications, etc. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Re: X and Supplementary Planes
Roozbeh Pournader wrote on 2001-12-27 22:58 UTC: I remember the discussion here about the font naming and structure issues for non-BMP characters. But I cannot remember the outcome (and if there were oppositions). Since we are thinking about doing some work on Pango to support non-BMP characters, I wanted to ask for a briefing... I am more and more convinced that if we are going to do anything of this sort on the old XLFD front, then it should be the definition of a new ISO10646-C encoding, which is a glyph encoding and which has in its properties character/glyph mappings. This encoding would come together with little and highly efficient C functions makeiso10646cglyphmap(XFontStruct *font, iso10646cglyphmap *map); reads the character-to-glyph mapping table from the font properties into a compact and efficient in-memory representation freeiso10646cglyphmap(iso10646cglyphmap *map); frees that in-memory representation mbtoiso10646c(char *string, iso10646cglyphmap *map, XChar2b *output); wctoiso10646c(wchar_t *string, iso10646cglyphmap *map, XChar2b *output); take a Unicode character string and convert it to a XChar2b glyph string suitable for output by XDrawString16 with the ISO10646-C from which the iso10646cglyphmap was extracted. ISO10646-C fonts would still be limited to have not more than 64 kibiglyphs, but these can come from anywhere in UCS, not just from the BMP. This solution also easily provides for glyph substitution, such that we can finally handle the Indic fonts. It solves the huge-XFontStruct problem of ISO10646-1, as XFontStruct grows now proportionally with the number of glyphs, not with the highest characters. It could also provide for simple overstriking combining characters, but then the glyphs for combining characters would have to be stored with negative width inside an ISO10646-C font. It can even provide support for variable combining accent positions, by having several alternative combining glyphs with accents at different heights for the same combining character, and the ligature substitution tables would encode, which combining glyph to use with which base character. Looks all very easily doable to me. Someone would just have to sit down and write a proper spec for ISO10646-C fonts plus the above mentioned client-side Unicode - ISO10646-C character-glyph-conversion routine, and then we can start producing fonts and tools. Unfortunately, I don't have the time to start this in the foreseeable future. Any volunteers interested in writing a first draft (preferably in the same troff format in which the other X spects are already written)? Markus P.S.: The C in ISO10646-C stands for combining, complex, compact, or character-glyph mapped, as you prefer. (And please don't start again with the fuzz about that C is not a part number of an ISO standard.) For those who think that all this is obsolete, remember that it is the only solution proposed so far that makes efficient complex script rendering available to legacy X terminals with immutable ROM servers and no RENDER or ST extension. -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]US-ASCII part of CJK TTFs served by freetype and xtt backends
On Wed, 24 Oct 2001, Jungshik Shin wrote: JC If you desire a different behaviour, you should either try to get your JC applications to work with `-p-' fonts, or push for a ``biwidth'' `-b-' JC spacing type to be included in a future versions of the XLFD. If the answer to my two questions above is no, I don't think '-b-' is necessary. '-b-' is only necessary for 'iso10646-1' fonts derived from CJK TTFs, but I'm not talking about them (for them, I suggested 'subsetting' as a way around in my previous mesg.). Of course, if the answer is yes, my point is moot. We discussed biwidth (-b-) fonts ages ago, but since then I though it had been agreed that splitting bi-width fonts up into two charcell (-c-) XLFDs is actually better, because leaves the application to decide, which width to use for which character, especially as CJK and Western habits differ here significantly. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]xterm-158, XIM and UTF-8
On Wed, 12 Sep 2001, Steve Swales wrote: XIM is working very well with the CSI (code-set independent) version of xterm provided by Li18nux.org (patches from IBM). It works equally well in our UTF-8 locales and our non-UTF-8 locales. Because of this (CSI), Sun will be adopting this version of xterm, rather than the utf-8 hardwired one, for a future release of Solaris. We are working with the patch developers to enhance and extend this implementation to make it a fully functional, code-set independent, internationalized terminal emulator. We will, obviously, be providing these enhancements back to the community, and Sun will be promoting this xterm at X.Org as well. I'm looking forward to see these enhancements to the X.Org xterm. Like Juliusz, I hope that Sun is aware that the current CSI API provides a functionality significantly more restricted to what our existing hardwired xterm offers already. I hope Sun is fully aware of the significant extensions that have to be made to the CSI concept (which was originally purely developed with the requirements of ISO 8859 and CJK legacy encodings in mind) in order to cover the additional specific functionality required for proper Unicode support. Before XFree86 considers abandoning its current xterm UCS extensions, I'd hope that Sun's CSI equivalent will feature equivalent functionality, for example: - Support of at least up to two overstriking combining characters as they are essential for support of the Thai, Laos and other scripts. - Selection of glyphs from the single-width and double-width font based on either libc wcwidth() or the current XFree86 convention documented in http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c - Support for UTF8_STRING selections independent of the current locale, in order to facilitate the simple and effective exchange of data between applications running in different locales or the many important already existing Unicode-only applications. I also recommend a careful comparison of the currently used rather buggy and incomplete X.Org keysym-Unicode table with the more up-to-date XFree86 table on http://www.cl.cam.ac.uk/~mgk25/ucs/keysym2ucs.c If you are working on a XIM for UTF-8 locales, I'd also like to draw your attention to ISO/IEC 14755 Information Technology -- Input methods to enter characters from the repertoire of ISO/IEC 10646 with a keyboard or other input devices http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14755.pdf which describes a number of basic universal character entry methods that would be extremely useful to have integrated into XIM, such that independent of the current locale, characters can always also be entered into any application via their Unicode hex value. Our experience has been that the interactions of UTF-8 and the VT100 semantics offered by xterm can be very tricky and I'd like to encourage you to make development snapshorts of your xterm release available for alpha testing by Li18nux, XFree86, and text mode editor developers early and often, such that we can provide thorough debugging and feedback for the implementation long before it sets a standard by becoming part of an X.Org release. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n