from:"Markus Kuhn"

Re: [I18n] urdu keymap

2004-09-01 Thread Markus Kuhn

A brief purist's rant on keymaps and keysyms:

X11 keymaps were originally merely meant to represent what symbols are
printed/engraved on the keycaps of the keyboard hardware used (see
Appendix A of the X11 protocol specification).

Is the proposed Urdu keyboard layout already used by any keyboard
manufacturer, or is this forseen in the immediate future? Can you send
us a photo of a production prototype of that keyboard?

I am asking, because I doubt somewhat that either XFree86 or X.Org are
particularly interested in becoming a repository for invisible
convenience keymaps that are not actually reflecting the symbols
visible on real-world hardware. People can always configure such
convenience additions privately using xmodmap. Maintaining a repository
of personal convenience keymaps sounds like a very openended endeavor.

Perhaps, there should be a requirement to provide a high-resolution
photograph of a real keyboard along with each keymap that gets into the
X11 distribution? That would also simplify tremendously the
identification of which keymap belongs to which hardware, and it would
help us a bit at least to weed out all the non-existing phantasy
keyboards.

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n] urdu keymap

2004-09-01 Thread Markus Kuhn

Kakilik Group wrote on 2004-09-01 15:42 UTC:
 There are no any hardware keyboard for Turkmen
 language(My language). Their alphabet is too
 young (1991-...).

Does the keysymdef.h file that we are currently revising on

  http://www.cl.cam.ac.uk/~mgk25/ucs/keysymdef.h

cover all the characters that you need?

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n] Re: Revision of Appendix A of the X11 Protocol Spec: KEYSYM Encoding

2004-08-17 Thread Markus Kuhn

Alan Coopersmith wrote on 2004-08-16 16:35 UTC:
 Markus Kuhn wrote:
  I have substantially revised and updated the long neglected KEYSYM
  Encoding specification in Appendix A of the X11 Protocol Standard. The
  result, which I propose for inclusion into the next X.Org release,
  is on
  
http://www.cl.cam.ac.uk/~mgk25/ucs/X11.keysyms.pdf
 
 While this looks good at a first glance, I think at this point it will
 have to wait for the release after X11R6.8 since there's simply not time
 for everyone to review it in the week remaining to the planned release
 of R6.8.

OK, fair enough. I'll probably put some more work into it. In
particular, I believe that the table of function keys should probably be
turned into a format that makes it possible to have one or more
descriptive sentences associated with each function key, to illuminate
its source and purpose much better. The current simple names for each
function key are often quite cryptic and not exactly very useful
definitions of what these keysyms are good for and where they came from.
After that, we can start looking at the various new multimedia/Internet
keys on recent PC keyboards, as well as the archeology necessary to
uncover what many of the more obscure older function keys were exactly
meant to mean. (E.g., why were all the ISO 9995-7 shift keys called
ISO ... latch, etc.?)

But before I touch this again, do send me any comments that you have on
the current version.

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n] Re: [Xorg] Revision of Appendix A of the X11 Protocol Spec: KEYSYM Encoding

2004-08-16 Thread Markus Kuhn

Alex Deucher wrote on 2004-08-16 16:39 UTC:
http://www.cl.cam.ac.uk/~mgk25/ucs/X11.keysyms.pdf
http://www.cl.cam.ac.uk/~mgk25/ucs/X11.keysyms
 
 Please post this as a bug in xorg bugzilla so it can be tracked an
 properly integrated:
 http://bugs.freedesktop.org

I've added it to

  http://freedesktop.org/bugzilla/show_bug.cgi?id=246

There is also a wiki page on the subject on

  http://freedesktop.org/XOrg/KeySyms

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n] Revision of keysymdef.h

2004-08-09 Thread Markus Kuhn

Over the years, a number of problems have piled up with X11/keysymdef.h,
which I have set out to sort out.

These are in particular:

  - Xfree86 has added since X11R6.4 was released in 1999 and 2000 a large
number of new macro definitions that are neither in line with the X11
protocols specification (Appendix A), nor have all of them been
checked charefully against the Unicode standard.

  - The X11 standard lacks so far an official clarification of the
relationship between keysyms and Unicode

I have therefore reviewed, which keysyms are actually used in kbd
mapping files and have substantially reworked and cleaned up the X11/
keysymdef.h file. The result is available on

  http://www.cl.cam.ac.uk/~mgk25/ucs/keysymdef.h

The database file that I have prepared to control the changes
made is on

  http://www.cl.cam.ac.uk/~mgk25/ucs/keysyms.txt

and it contains further background information in the form of
comments and a status column for each keysym.

Quick summary of the changes:

  - I have preserved as newly added keysyms in the range reserved
for the X11 standard only the following ones, as they are widely used
now in XFree86 keyboard mapping files:

  0x06ad   Ukrainian_ghe_with_upturn
  0x06bd   Ukrainian_GHE_WITH_UPTURN
  0xfe60   dead_belowdot
  0xfe61   dead_hook
  0xfe62   dead_horn

These will have to be added to Appendix A of the protocols spec.

  - A large number of Armenian, Gregorian, Irish, Arabic, Cyrillic,
Caucasus, and Vietnamese keysym macros were added by Pablo Saratxaga
on 1999-06-06 and 2000-10-27. With a very small number of exotic
exceptions, none of these keysyms are at present used in any
XFree86 files.

As many of them look useful, I have decided to remap them directly
into the Unicode range by adding 0x0100 to their Unicode value.
This way, these keysyms will not have to be added to the X11 standard,
as they are implicitely defined by ISO 10646.

  - I have also moved the currency symbol in the 0x20xx range
to the Unicode mapping range, with the exception of the EuroSign,
which is the only one of these that is actually used in keyboard
mapping files.

  - I have added as comments to each keysym for which there exists
a direct or approximate Unicode equivalent the Unicode position
and name of this character.  

  - Various minor editorial fixes for better consistency, comments
added that clarify the Unicode mapping rule.

Unless someone shouts with an objection, I will submit these for inclusion
into the X.Org and XFree86 trees soon.

Proposed update to the X11 standard will follow shortly.

Markus

-- 
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n] Re: Emulation of Alt+Numpad+Digits behavior

2004-08-04 Thread Markus Kuhn

Alan Coopersmith wrote on 2004-08-04 15:00 UTC:
  What I am trying to do is to emulate the 
  MS-Windows behaviour which lets one enter arbitrary characters by using 
  the Alt-Key while entering the character code on the numerical keypad.

The MS-Windows behaviour is somewhat cumbersome for several reasons:

  - It uses decimal numbers, whereas many people seem to be far more
familiar with hexadecimal numbers of well-known non-ASCII Unicode
characters (20AC is the EURO, 2018/2018 are the left/right single
quotation mark, 2013/2014 are en/em dash, 2212 is the minus sign, etc.),
probably because the Unicode standard prints only the hex codes.

  - It requires two hands to be used simultaneously.

  - It requires the entry of a redundant leading zero, to avoid
the MS-DOS CP437 backwards compatibility mode.

There is a much neater alternative standardized in

  ISO/IEC 14755 - Input methods to enter characters from
  the repertoire of ISO/IEC 10646 with a keyboard or other input devices
  http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14755.pdf

which fixes all these problems.

If you do something in this area, please implement the ISO 14755 hex
input method, and not the old MS-Windows one. (Or implement both
together, if you really need MS-Windows compatibility here. They don't
interfere with each other, because the ISO 14755 technique uses
Ctrl-Shift to activate the hex-entry mode, while MS-Windows uses Alt.)

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n] Keysyms for Unicode values

2004-04-15 Thread Markus Kuhn

Dave Williss wrote on 2004-04-14 22:39 UTC:
 I've noticed that for the most part, XKeysym values are just
 the Unicode value of the character.

No, look closer, they are not. Unicode did not exist yet when keysyms
were defined, therefore, keysyms are in a sense something similar to
Unicode, but the code positions are competely different.

 However, there are
 obviously Unicode values which would overlap keysyms
 in the 0xFE00 to 0x range which would conflict.
 
 I've also noticed references to passing keysyms with
 the 0x0100 bit set to mean that the lower part of
 the keysym is a Unicode value.

 So my question is, for what Unicode values do I _need_ to
 use the 0x0100 bit and what ones should I not?
 I assume that if just set it for everything, old X clients
 would be confused by it and not know what to do.

The relationship between the keysyms and Unicode is currently being
addressed in a revision of the X11 protocol standard appendix that
officially defines the keysyms.

You can watch some of this process on the X.Org wiki page

  http://freedesktop.org/XOrg/KeySyms

In a nutshell: for characters for which a keysym already exists (with a
few exceptions where the meaning of the keysym is unclear), use the
existing keysym value. For characters for which no keysym exists, add
0x0100 to the Unicode value and use that instead. An official
round-trip compatible Unicode mapping table for the existing keysyms is
under preparation and will be part of the next major X.Org release.

http://www.cl.cam.ac.uk/~mgk25/unicode.html#x11

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n] UTF-8 ICCCM properties

2004-03-22 Thread Markus Kuhn

Juliusz wrote in

  http://www.pps.jussieu.fr/~jch/software/UTF8_STRING/UTF8_STRING.text

 In the interest of interoperability, the semantics of the selection
 target TEXT are *not* changed; in particular, replying with a
 selection type of type UTF8_STRING to a request specifying TEXT is
 explicitly *not* allowed.

Question: That was mostly in the context of how to handle xterm-style
cutpaste selections. What about the window manager property types

  WM_CLIENT_MACHINE
  WM_ICON_NAME
  WM_NAME

which are all specified in the ICCCM as type TEXT. If we exclude
UTF8_STRING from being used where a type TEXT is specified, we will have
no portable way of using Unicode in windows titles and icon names.

Why can't we just jump into the cold water by adding UTF8_STRING to the
list of encodings allowed to be used when then polymorphic type TEXT is
used, with the simple restriction that STRING must be used whenever all
the character of the string are contained in Latin-1?

I understand that it may break temporarily a few things if the
originator of a property uses UTF8_STRING and the recipient does not yet
understand it. But in practice, isn't that just the same situation as we
had when COMPOUND_TEXT was added in 1991 and C_STRING was added in 1993,
two types that even today are still not supported by most applications.

The only alternatives I could think of are all a bit ugly, such as
adding

  WM_CLIENT_MACHINE_UTF8
  WM_ICON_NAME_UTF8
  WM_NAME_UTF8

all of type UTF8_STRING.

Opinions?

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n] Unicode keysym questions

2004-03-05 Thread Markus Kuhn

Alexander Krauss wrote on 2004-03-05 17:36 UTC:
 while trying to develop a keymap which includes mathematical symbols, I am
 wondering about the exact status of the UCS keysyms 0x0100 and
 above... Are these already standardized? Do any X servers except XF86
 currently use them?

The X.Org Foundation has given me access to their CVS just last week to
ammend the X11 protocol specification and to make this convention
official. I was on a phone conference with them last Monday and they all
agreed that adding the 0x0100 convention to the standard would be
most sensible.

 And... how exactly should they be interpreted by clients? Should there be
 any difference between for example eacute and U00E9?

You will have to continue to use the existing keysyms if a character has
one. The +0x0100 Unicode mapping is exclusively meant for adding any
new keysyms for which there isn't already an existing code. This is to
preserve backwards compatibility.

Having said that, we may decide to retire a couple of the most obscure
of the old keysyms for which simply the semantics has been lost in the
mist of time, and where we are confident that +/- 0 people are actually
using them.

A more tricky question is what to do with the unauthorized addition of
new Latin-8, Vietnamese, and Arabic keysyms a while ago by someone in
XFree86 in the code space that used to be restricted for X.Org.

I am mildly inclined to remove these and replace them with the
equivalent +0x0100 Unicode mappings, in the interest of keeping
mapping tables small, but I don't know how widely they have become used
since XFree86 added them.

 Should a client
 interpret a U001B as an escape keystroke

None of the values in the range 0x0100 to 0x01000100 will
technically be assigned keysyms, as all ISO 8859-1 codes have already
other code positions assigned. What your client decides to do if you
receive one of these nevertheless (or any other random unassigned keysym
value) will therefore be outside the X11 protocol specification.

 or are they all by definition
 characters and should be interpreted e.g. as the user wants this thing
 in his UTF-8 document...

If you want to add such a function to your client, than that is up to
you. However, a correctly configured X11 server should never send out a
0x011b keysym. Anything else would be a non-backwards compatible
modification of the X11 protocol, that is likely to find resistance
within X.Org.

 Or is this simply not strictly defined?

We can define it now as strictly as we want and need, because the text
passage that defines that officially will be written over the next few
weeks.

 I also noticed that the Compose-Files of 4.3.0 in UTF-8 locales use the
 U keysyms even for characters that have old keysyms (all the
 accented latin-{12...}  chars).

I would argue that any U notation used in compose files will have to
go through a special unicode2keysym conversion function that uses a
mapping table. You cannot simply add 0x0100 to *any* Unicode
character to get its keysym. If XFree86 doesn't do that conversion
correctly at the moment, please file this into the xfree86 bugzilla such
that it will not get lost. Check, what keysym values these compose files
produce on the wire, which is all that counts in the end.

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n] X.Org (Foundation) waking up / UTF8_STRING / Keysyms

2004-02-28 Thread Markus Kuhn

Rumour has it that X.Org, an organization we long believed to be dead,
has woken up again, changed its name to X.Org Foundation, and finally
wants to update the X11 spec:

  http://www.x.org/XOrg_Foundation.html

UTF8_STRING has already made it onto the agenda:

  http://www.opengroup.org/sophocles/show_mail.tpl?source=Llistname=xorg_archid=16

Keysyms probably will be next. More on the xorg_arch mailing list:

  http://www.x.org/XOrg_Foundation_Join_OpenLists.html

Markus

-- 
Markus Kuhn, Computer Laboratory, University of Cambridge
http://www.cl.cam.ac.uk/~mgk25/ || CB3 0FD, Great Britain

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n] Unicode

2003-01-14 Thread Markus Kuhn

Bharathi S wrote on 2003-01-14 12:17 UTC:
 How to send a 16Bit Unicode value to a Application ? If I use the
 XmodMap, then Which Xlib function is responsible for taking the
 Unicode Value frm XModMap ?

Make sure you are in a UTF-8 locale and use the keysym value 0x0100abcd
with xmodmap, in order to represent the Unicode character U+abcd.

Also read:

  http://www.cl.cam.ac.uk/~mgk25/unicode.html#x11

Instead of xmodmap, also consider to use xkbcomp.

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]per mille symbol, ligature symbols?

2003-01-03 Thread Markus Kuhn

Andreas Tobler wrote on 2003-01-03 12:36 UTC:
 e.g. key TLDE {[0x0100+0x2030]}

0x01002030

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n]Re: Decimal key on European keyboard layouts

2002-12-19 Thread Markus Kuhn

Dr Andrew C Aitchison wrote on 2002-12-13 10:13 UTC:
 Looking at the unicode charts (especially the character name index
   http://www.unicode.org/charts/charindex.html
 ) I see that ASCII dot 0x2E has become Unicode 0x002E Decimal Point
 and ASCII comma 0x2C has become 0x002C decimal separator.
 http://www.unicode.org/charts/PDF/U.pdf
 renders these in the English way, not the continental one you desire.

U+002E = FULL STOP
U+002C = COMMA

There is no question at all in Unicode about how these two characters
have to be rendered. Their rendering is locale independent.

There was discussion long ago about adding a decimal separator
character to Unicode, but the idea was considered unnecessary and
confusing and therefore dropped. ISO has in the past suggested to use a
tiny downwards-facing triangle that is around the size of a full stop or
comma as a culturally neutral glyph for a decimal separator key, but
that too has not caught on.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: Solution. was:[I18n]XFree86 Xutf8LookupString BUG with Solarix X server.

2002-11-30 Thread Markus Kuhn

Thanks for the fast bug fix. I hope, RedHat/Suse/etc. will fix this in
their RPMs *very soon*, otherwise the UTF-8 locales remain completely
unuseable for everyone on an X server without XKBD (e.g., Solaris).

Ivan Pascal wrote on 2002-11-29 12:46 UTC:
   But under the UTF-8 locale XLookupString returns two (and more) chars
 for non-ascii keysyms.  And X{mb|wc|utf8}LookupString mistakely converted
 them to UTF-8 one more time.

Wasn't Xutf8LookupString supposed to be guaranteed to be locale
encoding *independent*? So why does it have to be implemented on top of
the (apparently) locale-dependent XLookupString? Sounds not entirely
kosher ...

 the X{mb|wc|utf8}LookupString family has a checking which discards
 non-ascii char outputed by XLookupString if it is only one

Why is it necessary to distinguish between ASCII and non-ASCII
characters?

 xc/lib/X11/XKB.c

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/


___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]XFree86 Xutf8LookupString BUG with Solarix X server

2002-11-28 Thread Markus Kuhn

Juliusz Chroboczek wrote on 2002-11-28 11:36 UTC:
 'A knot!' said Alice.  'Oh, do let me help to undo it!'
 
 Could you please put an xscope dump on the web somewhere ?

Thanks! I didn't know about xscope.

The requested dump is now on

  http://www.cl.cam.ac.uk/~mgk25/ucs/xev-adiaeresis-utf8.txt

including a full description of what I did to get the log.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n]xev XIM patch

2002-11-27 Thread Markus Kuhn

In order to hunt down a very odd problem with the XmbLookupString and
Xutf8LookupString functions, I have extended the good old xev command to
print the output of these two functions as well, in addition to that of
XLookupString.

In order to use XmbLookupString or Xutf8LookupString, one needs to
provide an X Input Context (XIC), which one can get after opening an X
Input Method (XIM).

Not being an XIM guru, I have copied and simplified the minimally
necessary code to get it running from xterm-170's charproc.c:VTInitI18N.

Could someone who is at least slightly more familiar than me in XIM
matters have a brief look at the attached patch (especially the last
hunk), before I send it to the CVS maintainers?

http://devel:passwd@www.xfree86.org/devel/cgi-bin/cvsweb.cgi/~checkout~/xc/programs/xev/xev.c?content-type=text/plain

Thanks!

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/


--- xev.c.orig  Tue Nov 26 02:10:51 2002
+++ xev.c   Wed Nov 27 16:15:26 2002
@@ -30,117 +30,152 @@
 */
 /* $XFree86: xc/programs/xev/xev.c,v 1.6 2002/11/26 02:10:51 dawes Exp $ */
 
 /*
  * Author:  Jim Fulton, MIT X Consortium
  */
 
 #include stdio.h
 #include stdlib.h
 #include ctype.h
 #include X11/Xlocale.h
 #include X11/Xos.h
 #include X11/Xlib.h
 #include X11/Xutil.h
 #include X11/Xproto.h
 
 #define INNER_WINDOW_WIDTH 50
 #define INNER_WINDOW_HEIGHT 50
 #define INNER_WINDOW_BORDER 4
 #define INNER_WINDOW_X 10
 #define INNER_WINDOW_Y 10
 #define OUTER_WINDOW_MIN_WIDTH (INNER_WINDOW_WIDTH + \
2 * (INNER_WINDOW_BORDER + INNER_WINDOW_X))
 #define OUTER_WINDOW_MIN_HEIGHT (INNER_WINDOW_HEIGHT + \
2 * (INNER_WINDOW_BORDER + INNER_WINDOW_Y))
 #define OUTER_WINDOW_DEF_WIDTH (OUTER_WINDOW_MIN_WIDTH + 100)
 #define OUTER_WINDOW_DEF_HEIGHT (OUTER_WINDOW_MIN_HEIGHT + 100)
 #define OUTER_WINDOW_DEF_X 100
 #define OUTER_WINDOW_DEF_Y 100

 
 typedef unsigned long Pixel;
 
 const char *Yes = YES;
 const char *No = NO;
 const char *Unknown = unknown;
 
 const char *ProgramName;
 Display *dpy;
 int screen;
+XIM xim = (XIM) NULL;
+XIC xic = (XIC) NULL;
 
 void
 prologue (eventp, event_name)
 XEvent *eventp;
 char *event_name;
 {
 XAnyEvent *e = (XAnyEvent *) eventp;
 
 printf (\n%s event, serial %ld, synthetic %s, window 0x%lx,\n,
event_name, e-serial, e-send_event ? Yes : No, e-window);
 }
 
+void
+hexdump (s, len)
+char *s;
+{
+for (; len  0; len--, s++)
+printf(%02x%s, (unsigned char) *s, len  1 ?   : );
+}
 
 void
 do_KeyPress (eventp)
 XEvent *eventp;
 {
 XKeyEvent *e = (XKeyEvent *) eventp;
 KeySym ks;
 char *ksname;
 int nbytes;
 char str[256+1];
 
 nbytes = XLookupString (e, str, 256, ks, NULL);
 if (ks == NoSymbol)
ksname = NoSymbol;
 else if (!(ksname = XKeysymToString (ks)))
ksname = (no name);
 printf (root 0x%lx, subw 0x%lx, time %lu, (%d,%d), root:(%d,%d),\n,
e-root, e-subwindow, e-time, e-x, e-y, e-x_root, e-y_root);
 printf (state 0x%x, keycode %u (keysym 0x%lx, %s), same_screen %s,\n,
e-state, e-keycode, (unsigned long) ks, ksname,
e-same_screen ? Yes : No);
 if (nbytes  0) nbytes = 0;
 if (nbytes  256) nbytes = 256;
 str[nbytes] = '\0';
-printf (XLookupString gives %d bytes:  \%s\\n, nbytes, str);
+printf (XLookupString gives %d bytes:  \%s\ (, nbytes, str);
+hexdump(str, nbytes);
+printf ()\n);
+if (e-type == KeyPress) {
+if (xic) {
+nbytes = XmbLookupString(xic, e, str, 256, ks, NULL);
+   if (nbytes  0) nbytes = 0;
+   if (nbytes  256) nbytes = 256;
+   str[nbytes] = '\0';
+   printf (XmbLookupString gives %d bytes:  \%s\ (,
+   nbytes, str);
+   hexdump(str, nbytes);
+   printf ()\n);
+   }
+#ifdef X_HAVE_UTF8_STRING
+   if (xic) {
+   nbytes = Xutf8LookupString(xic, e, str, 256, ks, NULL);
+   if (nbytes  0) nbytes = 0;
+   if (nbytes  256) nbytes = 256;
+   str[nbytes] = '\0';
+   printf (Xutf8LookupString gives %d bytes:  \%s\ (,
+   nbytes, str);
+   hexdump(str, nbytes);
+   printf ()\n);
+   }
+}
+#endif
 }
 
 void
 do_KeyRelease (eventp)
 XEvent *eventp;
 {
 do_KeyPress (eventp);  /* since it has the same info */
 }
 
 void
 do_ButtonPress (eventp)
 XEvent *eventp;
 {
 XButtonEvent *e = (XButtonEvent *) eventp;
 
 printf (root 0x%lx, subw 0x%lx, time %lu, (%d,%d), root:(%d,%d),\n,
e-root, e-subwindow, e-time, e-x, e-y, e-x_root, e-y_root);
 printf (state 0x%x, button %u, same_screen %s\n,
e-state, e-button, e-same_screen ? Yes : No);
 }
 
 void
 do_ButtonRelease (eventp)
 XEvent *eventp;
 {

[I18n]XFree86 Xutf8LookupString BUG with Solarix X server

2002-11-27 Thread Markus Kuhn

I think I have run into a serious bug with XFree86's Xutf8LookupString
implementation. It occurs when the client runs under XFree86 4.[12], but
the X server is for example Solaris 5.8

  vendor string:Sun Microsystems, Inc.
  vendor release number:6410

(also occurs on Solaris 5.7). It does but not occur when the X server is
XFree86 4.1.

i) To reproduce the problem, start an X client in the following
environment:

  - use a UTF-8 locale (e.g., LC_CTYPE=en_GB.UTF-8)
  - use an XFree86 4.1 or 4.2 Linux system (tested on Red Hat 7.2,
Red Hat 8.0 and SuSE 8.1)
  - point $DISPLAY to a Sun Solaris X server

Then press on the Sun X server a key that causes the keysym
adiaeresis to be sent to the above client. The client will
receive from the various key string lookup functions the
following strings (as displayed in a UTF-8 xterm, hex values
provided for clarity):

XLookupString gives 2 bytes:  ä (c3 a4)
XmbLookupString gives 4 bytes:  Ã¤ (c3 83 c2 a4)
Xutf8LookupString gives 4 bytes:  Ã¤ (c3 83 c2 a4)

There are two problems, the first critical, the second dubious:

  a) CRITICAL: Both X{mb,utf8}LookupString output the same broken
 byte sequence that one gets if one sends the UTF-8 sequence for
 ä (c3 a4) erroneously through an ISO 8859-1 - UTF-8 converter,
 i.e. c3 83 c2 a4.

  b) DUBIOUS: XLookupString is according to the manual supposed to *always*
 return ISO 8859-1 strings (just like STRING atoms always use ISO 8859-1),
 but here it actually returns text in the locale's multibyte encoding.
 (This is ok, if we can agree to change the libX11 C API
 definition accordingly, but it looks suspiciously like someone has
 been HACKing without respect for the API spec).

ii) If the same setup as in i) is used, but the locale of the client
replaced with an ISO 8859-1 locale (e.g., en_GB), then the result looks
correct (as displayed in an ISO 8859-1 xterm):

XLookupString gives 1 bytes:  ä (e4)
XmbLookupString gives 1 bytes:  ä (e4)
Xutf8LookupString gives 2 bytes:  Ã¤ (c3 a4)

iii) Similarly, if the same setup as in i) is used, but the X server is
XFree86 4 with e.g.

  vendor string:The XFree86 Project, Inc
  vendor release number:4010
  XFree86 version: 4.1.0

at least the CRITICAL problem is gone (as displayed in a UTF-8 xterm):

XLookupString gives 2 bytes:  ä (c3 a4)
XmbLookupString gives 2 bytes:  ä (c3 a4)
Xutf8LookupString gives 2 bytes:  ä (c3 a4)

All the above output is from a patched version of xev that outputs
the strings from all three lookup functions.

This bug report distills my findings reported here earlier that
UTF-8 keyboard support fails from a Sun X server with the xterm and
emacs implementations in Red Hat 8.0.

Any ideas or reports of reproduceability would be welcome. This might
turn into a high priority problem, as breaking the X protocol this way
might be a major UTF-8 show stopper. Please have a look at it ...

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]Unicode / 16-Bit

2002-11-19 Thread Markus Kuhn

 Is the X (4.x) supporting Unicode/16 Bit encoding ? How ?

Yes, to some degree. To get started, try:

  http://www.cl.cam.ac.uk/~mgk25/unicode.html

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]Syriac keyboard layout

2002-11-14 Thread Markus Kuhn

Emil Soleyman-Zomalan wrote on 2002-11-14 16:00 UTC:
 Just for my own knowledge, what would be the disadvantage of creating a 
 new set of keysyms for Syriac as already has been done for several other
 languages? 

There is nothing wrong in principle with adding new keysyms, however the
integer number associated with new keysyms for which there is an
equivalent Unicode U-00xx character MUST be 0x01xx, because
we do not want to let the keysym table grow needlessly (to more than
6 entries). However, since keysyms are never visible to anyone but
people who write keyboard layout tables, it is a senseless exercise
to add new ones, so we stopped it entirely and now consider keysyms
merely to be a frozen pre-Unicode era artifact. Keysyms will only be added
for keyboard keys, for which it is unlikely that there will ever be a
good equivalent in Unicode.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]Locales and charset encodings

2002-11-12 Thread Markus Kuhn

Juliusz Chroboczek wrote on 2002-11-12 16:49 UTC:
 Use nl_langinfo(3), but it's an SUSv2 interface (i.e. it's not in POSIX)

POSIX.{1,2}:2001 and SUSv4 are the same thing now.

So nl_langinfo(CODESET) is now actually officially part of the holy
dogma of the church of POSIX. Halleluhia.

Unfortunately, the return values of nl_langinfo(CODESET) are not
standardized. Fortunately, the systems out there that implement
nl_langinfo(CODESET) can be counted with the fingers of one hand
and their outputs are easily normalized by a little routine:

  http://www.cl.cam.ac.uk/~mgk25/ucs/norm_charmap.c

On legacy systems without nl_langinfo(CODESET), you can get a decent
educated guess from this routine:

  http://www.cl.cam.ac.uk/~mgk25/ucs/langinfo.c

More on this issue is on:

  http://www.cl.cam.ac.uk/~mgk25/unicode.html#activate

DW Would there be a way to _force_ the encoding to, say UTF-8?

Unfortunately not with any formally standardized functions. The inertia
of standardization bodies makes it currently not feasible to give up the
notion that UTF-8 is a bit more useful than yet-another
multibyte-encoding. Until then, XFree86 has added Xutf8*() functions for
that exact purpose, and we strongly hope that they will catch on with
other implementors as well. Though not all have seen the light yet.

Don't expect any progress with the X11 spec soon. What was supposed to
be the X11 standards body (X.Org) has been in a coma for several years
and is awaiting its death certificate.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]Re: [li18nux:1094] BACKSPACE behavior

2002-10-03 Thread Markus Kuhn


I had originally argued strongly in favour of a BACKSPACE display
semantics that removes the character left of the cursor (let's call this
character L), and then moves the cursor wcwidth(L) character cells
to the left. This is by far the most sensible solution, because this
way, if you echo the keyboard output back into the display, pressing
backspace will give you exactly the same effect as you would expect
in an editor. The result would have been that in order for backspace
to work correctly with double-width (and combining) characters,
no changes will have to be made to the tty cooked mode editor in
the kernel that you get when you type text into stdin of any
Unix application.

Unfortunately, existing CJK implementation practice has messed up
this and has used backspace with a move-cursor-left-one-cell display
semantics. An argument that we have to stick in UTF-8 modes compatible
with this highly unfortunate and inconvenient CJK implementation
practice has been made, but I am still not convinced that

  a) there really is such a backwards compatibility requirement
  b) that the 1-cell-left semantics of backspace has any advantage
 over the erase-1-character-left semantics whatsoever

I would say at least that the jury of what a backspace sent to a
UTF-8 terminal means is still out, and I'd advise authors of editors
not to send any backspace 0x08 characters to terminals. Please use
absolute or relative cursor positioning command sequences, which have
unambiguous semantics.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]Deadkeys for ntilde and accents on a US keyboard?

2002-10-02 Thread Markus Kuhn


Oscar A. Valdez wrote on 2002-10-01 22:37 UTC:
 How do I configure XFree86 v. 4.1.0 to get ñ,Ñ,á,é,í,ó and ú on a
 US-layout keyboard?

If it is only for personal single-user use, then .Xmodmap is still
the simplest solution.

I routinely type German and English text on a UK keyboard with the
following .Xmodmap file (which also disables the annoying caps-lock
key):

! to get capslock back: xmodmap -e 'add Lock = Caps_Lock'
clear lock
keysym a = a NoSymbol adiaeresis NoSymbol
keysym o = o NoSymbol odiaeresis NoSymbol
keysym u = u NoSymbol udiaeresis NoSymbol
keysym s = s NoSymbol ssharp NoSymbol
keysym p = p NoSymbol sectionGreek_pi
keysym d = d NoSymbol degree NoSymbol
keysym e = e NoSymbol EuroSign   NoSymbol
keysym i = i NoSymbol idiaeresis NoSymbol
keysym m = m NoSymbol emdash mu
keysym n = n NoSymbol endash NoSymbol
keysym space = space NoSymbol nobreakspace NoSymbol
keycode 34 = bracketleft braceleft leftsinglequotemark leftdoublequotemark
keycode 35 = bracketright braceright rightsinglequotemark rightdoublequotemark
keysym minus = minus underscore 0x01002212 NoSymbol

Just add the command xmodmap .Xmodmap to your .xsession file.

man xmodmap
man X
less /usr/include/X11/keysymdef.h 

Note that practically UK PC keyboards have an AltGr key. I understand,
that in the US, unfortunately not all PC keyboards have this key
available for the Mode_switch keysym, so you have to define some other
key (Alt_R, Ctrl_R, F1, etc.) to be the Mode_Switch key.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]unicode

2002-09-30 Thread Markus Kuhn


Viveka Nathan K wrote on 2002-09-30 10:56 UTC:
   I wish to use the unicode encoding.
   How can I know, which applications are supporting the unicode.
   What should I need to do, to make an application to support unicode ?

Read

  http://www.cl.cam.ac.uk/~mgk25/unicode.html

to get started.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]Default fonts for xterm

2002-08-26 Thread Markus Kuhn


Tomohiro KUBOTA wrote on 2002-08-26 08:48 UTC:
 The first is that simply using *-iso10646-1 fonts as defaults.

This could already be achieved by changing in /usr/lib/X11/fonts/misc/fonts.alias
the line

  fixed-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso8859-1

to

  fixed-misc-fixed-medium-r-semicondensed--13-120-75-75-c-60-iso10646-1

This was my goal many years ago, when I started with the misc-fixed
extension project and had no idea, how awful the X font system really
is. Unfortunately, merely changing fixed turned out to be not feasible
because

  a) The X protocol is very inefficient in handling sparse 16-bit encodings.
  b) Some legacy applications were unfortunately hardwired to assume that
 fixed is an 8-bit font.

I think a) is the more critical reason, and a 16-bit font should be used
by xterm by default only, if a better font API such as Xft is used instead of
XLoadQueryFont().

 The second solution is to implement UTF-8-specific font configuration
 items, like uFont, uFont2, uWideFont4, and so on.

This is a very good idea, and I remember that it was discussed and
welcome here before. I also had thought that someone had already written
a patch to do this, or had planed to do so, but apparently it never made
it into xterm. Using an independent set of font resource entries in
UTF-8 mode seems the right thing to do to me.

 I think the second one is better, though the first one is simpler and
 not very harmful.  However, I don't have enough time to work on these
 solutions.

I agree that the second solution is what should be done (but I don't have
the time to submit a patch myself either).

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]Displaying chinese in an xterm

2002-08-14 Thread Markus Kuhn


Jungshik Shin wrote on 2002-08-06 15:05 UTC:
   You can use one of 18pixel iso10646-1 bitmap fonts included in XF86
 4.x with more CJK characters than 13pixel font:
 
 -misc-fixed-medium-r-normal-ko-18-120-100-100-c-180-iso10646-1
 -misc-fixed-medium-r-normal-ja-18-120-100-100-c-180-iso10646-1
 
 However, I believe neither of them has the full coverage of GB 2312.

From README in http://www.cl.cam.ac.uk/~mgk25/download/ucs-fonts.tar.gz

   12x13ja.bdf:

 Covers all CP1252, CP437, JIS X 0208, and Hangul characters, and
 a few more. This font is primarily intended to provide Japanese
 full-width Hiragana, Katakana, and Kanji for applications that
 take the remaining (halfwidth) characters from 6x13.bdf. Might
 in the future be extended to cover TARGET2 if there is sufficient
 interest in using it as a stand-alone fixed-width font without
 6x13. The Greek lowercase characters in it are still a bit ugly
 and will need some work.

  18x18ja.bdf:

 Covers all JIS X 0208, JIS X 0212, GB 2312-80, KS X 1001:1992,
 ISO 8859-1,2,3,4,5,7,9,10,15, CP437, CP850 and CP1252 characters,
 plus a few more, where priority was given to Japanese han style
 variants. This font should have everything needed to cover the
 full ISO-2022-JP-2 (RFC 1554) repertoire. This font is primarily
 intended to provide Japanese full-width Hiragana, Katakana, and
 Kanji for applications that take the remaining (halfwidth)
 characters from 9x18.bdf.

  18x18ko.bdf:

 Covers the same repertoire as 18x18ja plus full coverage of all
 Hangul syllables and priority was given to Hanja glyphs in the
 unified CJK area as they are used for writing Korean.

What admittedly is still missing is an 18x18zh.bdf font that gives
priority to Chinese style variants, but GB 2312 is certainly covered by
both 18x18 fonts.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n]Re: Welsh support needed for XFree86

2002-08-14 Thread Markus Kuhn


Attached is an old email that represents the most authoritative
information that I have on the diacritic characters used in dictionaries
of the Welsh language. Hope this helped ...

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

--- Forwarded Message

Date: Tue, 18 Aug 1998 17:10:15 +0100
To: [EMAIL PROTECTED]
From: Andrew Hawke [EMAIL PROTECTED]
Subject: Welsh character sets (LONG MESSAGE)

Markus, 
   you e-mailed [EMAIL PROTECTED] regarding the frequency
of certain Welsh letter+accent combinations. He submitted your query
to the WELSH-L discussion list. I have replied to the list, but I also
felt that I should take the liberty of contacting you directly, as this
is something I have strong views on.

Some background:
I am Assistant Editor and Systems Manager for the University of Wales 
Dictionary of Welsh, the standard scholarly dictionary of the language. 
I also chair the Celtic Texts Specialist Group of the International
Association of Literary and Linguistic Computing. The University of
Wales has an orthography committee which publishes guidelines for
Welsh spelling which are accepted by all Welsh writers and
publishers. These notes are based on those guidelines. Welsh is now
legally one of the two official languages of Wales, on an equal
legal footing with English. The government has established a body
called the Welsh Language Board to promote the use of Welsh. The
language is now taught in every school in Wales (and is the main
language of instruction in many of them). Some 600 books and
many magazines and newspapers are published annually. The use of the
language in all spheres, and increasingly in business, public life,
the administration of justice, education, government and the media 
(there is a Welsh-language TV channel) is growing rapidly. Welsh is
spoken by approximately 500,000 people in Wales, and by several hundred
thousand outside Wales. The number of speakers showed a slight increase
at the last census, after nearly a century of continuous decline.

The availabilty of character sets to represent the language is
absolutely essential, and such character sets should be as complete
as possible. In the past, the lack of appropriate character sets
has been a considerable deterrant to using the language in print and
electronically. I would urge you to bear this in mind when considering
the following.

Johann van Wingen (of the Netherlands WG on ISO 10,460) pushed hard for
the inclusion of all the possible Welsh letter/accent combinations,
which was eventually accepted by the ISO and subsequently Unicode.

Microsoft has also committed to including the 13 additional characters
in its OpenType fonts. I have communicated extensively on this point
with John Hudson of Tiro Typeworks in Vancouver (www.tiro.com) who has
been working on OpenType fonts for Microsoft and for academic purposes.
I reproduce below my main comments to him which may be of assistance
to you.

= COPIED MATERIAL FOLLOWS 

Modern usage of the diacritics in Welsh is as follows:

(All diacritics are shown following the vowel which is accented, e.g.
a^ represents a lower-case a with a circumflex accent.)

Welsh requires the circumflex (^), acute ('), grave (`), and diaeresis ()
on all vowels, i.e. a, e, i, o, u w, y (w being used in Welsh both as a
vowel and a semi-vowel). The incidence of these combinations varies very
widely.

All diacritics (accents) in Modern standard Welsh are compulsory and are
used to differentiate between different pronunciations of otherwise
similar- or identical-looking words, either in terms of length (long vs.
short) or stress. The stress accent in Welsh always falls on the penultimate
syllable, unless an accent (or a hyphen or an inserted h) indicates otherwise.

BECAUSE OF THIS, ALL THE ACCENTED WELSH CHARACTERS ARE REQUIRED, IN BOTH
UPPER- AND LOWER-CASE FORMS.

The circumflex is used solely to indicate that a vowel is long in a context
in which it would normally be expected to be short, e.g.:

gwa^n `he pierces'  vs. gwan `weak'
gwe^n `a smile' vs. gwen `white (fem.)'
pi^n `pine (wood, tree)' vs.pi`n `a pin' 
co^r `a choir'  vs. cor `a dwarf'
bu^m `I was (perfect)'  vs. bum `five (mutated)'
tw^r `a tower'  vs. twr `a group'
y^m `we are'vs. ym `in (before m)'

The diaeresis is used to separate vowels, as in English:

prosaig `prosaic', crewr `creator', copio `to copy',
troedigaeth `conversion', duwch `blackness', Rebacayddiaeth
`Rebaccaism', cywres `concubine'

The acute accent is used to indicate unexpected stress (i.e. not on the 
penultimate):

casa'u `to hate', case't `cassette', ricri'wt `a recruit'
paraso'l `a parasol', rebu'wc `a rebuke', 
caridy'ms `riff-raff', gw'raidd `manly'

[I18n]Re: POSIX:2001 freely available, STIX Fonts

2002-07-10 Thread Markus Kuhn


Keith Packard wrote on 2002-07-10 01:09 UTC:
  AFAIK, SUS and POSIX say
  that it's implementation-dependent.
 
 Too bad the POSIX spec is closed so I can't check.

For all of you who haven't heard yet, SUS3 and POSIX:2001 are now the
same thing and are freely available online on

  http://www.unix-systems.org/version3/

Bookmark now.

Also interesting: the folks who brought you the free Type1 versions of
Computer Modern have agreed to put together a comprehensive free
high-quality Unicode font for scientific publishing:

  http://www.stixfonts.org/

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [Fonts]Re: [I18n]language tags in fontconfig

2002-07-09 Thread Markus Kuhn


Keith Packard wrote on 2002-07-06 10:34 UTC:
 I got the European coverage information from 
 
   http://www.evertype.com/alphabets/
 
 I don't know why all of the latin languages include @ and ', it's 
 probably just a mistake; they're easily removed.

Actually, thanks to URLs and email addresses, which can and do contain
*all* ASCII characters, in practice full 7-bit ASCII coverage is
required for writing *any* contemporary language. Only Romans,
Egyptians, Babylonians, Etruscans, etc. still get away without email ... :)

In addition, UCS specifically states that no UCS subset should exclude
the Basic Latin range of U0020-U007e.

Therefore, the ASCII coverage of Michael Everson's alphabet list should
more be seen as an academic curiosity, and not as relevant to
implementations.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n]Please do not use en_US.UTF-8 outside the US

2002-04-30 Thread Markus Kuhn


As we are talking about en_US.UTF-8:

General warning: Please do not use the locale name en_US.UTF-8 anywhere
outside North America. Some older Solaris documentation suggested that
this is the only UTF-8 locale you'll ever need, as locales don't change
much sensible beyond the encoding anyway. This is not the case any more
today!

An increasing number of programs of US origin finally start to abandon
the annoying old habit of assuming Legal paper and non-metric units as
default conventions everywhere, requiring 95% of the world population to
figure out how to reconfigure to the standard conventions.

More recent software releases instead determine the default setting for
conventions such as paper format and units of measurement with code
similar to the following (feel free to copy it into your software as
well):


#include stdio.h
#include stdlib.h
#include string.h

/* LC_PAPER and LC_MEASUREMENT were introduced in ISO/IEC TR 14652 */

int main()
{
  char *units = mm;
  char *paper = A4;
  char *s;

  if (((s = getenv(LC_ALL))*s) ||
  ((s = getenv(LC_PAPER))  *s) ||
  ((s = getenv(LANG))  *s))
if (strstr(s, _US) || strstr(s, _CA))
  paper = Letter;
  if (((s = getenv(LC_ALL))*s) ||
  ((s = getenv(LC_MEASUREMENT))  *s) ||
  ((s = getenv(LANG))  *s))
if (strstr(s, _US))
  units = inches;

  printf(Paper: %s\nUnits: %s\n, paper, units);
  
  return 0;
}


This leads to portable and agreeable default settings, using the
standard values UNLESS you are in a locale that explicitely says that
you are in North America. I think that's a very good implementation
practice, but it requires that if you explain to an international
audience how to activate UTF-8 locales, you should better use a non-US/
CA locale. (en_GB.UTF-8 for instance seems like an excellent choice ... :)

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]So, will Bidi+Xterm happen ?

2002-02-28 Thread Markus Kuhn


Nadim Shaikli wrote on 2002-02-28 23:48 UTC:
 What are your comments on mlterm, patch27, biditext (have you used 'em) ?

Can you send me a compact exact specification of the exact bidi
semantics of these implementations? I haven't seen one yet and I don't
have the time to reverse engineer these.

If cat works, this means nothing, as this just tests what the terminal
does when you send paragraphs with CRLF terminated lines to it and the
cursor is at the bottom of the screen. This tests only the most trivial
case of bidi functionality. I am far more worried about the sort of ESC
sequences that vim, readline, and ncurses use to talk to the terminal
and how they interact with the bidi. What does it mean to delete a
character in bidi mode (in which direction does it move and what happens
if it hits a bidi boundary), etc. Will it work with the tty cooked mode?

Forget about cat, think about editors, starting from the most primitive
ones.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]BiDi rant

2002-02-13 Thread Markus Kuhn


Mark Leisher wrote on 2002-02-13 00:29 UTC:
 I respect Markus too much to think this was anything more than a subconscious
 plea for simplicity and symmetry, born of irritation with messy reality.  Sort
 of absentmindedly muttering out loud when someone is nearby.

What I primarily wanted to remind people of is that bidi and VT100-style
terminal semantics do not mix well at all and that just repeatedly
reminding us of the user requirements/wishes/dreams in that area will
not change that it is a fundamentally tedious and difficult subject.
There are perhaps good reasons why ECMA-48 hasn't yet been fully
implemented and I perceive at least some agreement that xterm is
probably not the right level at which to implement bidi, in particular
not for editing and cursor control. There are good reasons why there is
no long-standing successful commercial tradition of Arabic/Hebrew
VT100-style terminals. I think, users of Arabic, Hebrew, (perhaps also
Indic) should best focus on non-terminal GUI applications, where bibi
can be easily performed in one go at the paragraph level, and simply not
expect too much from the character-cell terminal environment the exact
same level of functionality and convenience that LTR users enjoy.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]BiDi rant

2002-02-12 Thread Markus Kuhn


Tzafrir Cohen wrote on 2002-02-12 18:46 UTC:
  I think it might be a good idea to really keep bidi completely out of
  xterm. If people want to play around with bidi terminal semantics, then
  I would suggest that they build a filter that can be plugged in between
  the LTR terminal and the application, just like Juliusz' luit does
  already for non-UTF-8 encodings.
 
 How can I disable bidi support at run-time with such a model?

The bidi filter would intercept both the character streams to and from
the terminal emulator. It therefore can intercept not only ESC sequences
from the applications but also special hot-key keystroke sequences
from the terminal that can be used to change its parameters or bring up
a little menu.

GNU screen does all this already. It was written primarily to

  a) allow you to multiplex several terminal connections via
 a single physical or emulated terminal
  b) add support for cutpaste to physical or emulated terminals
 that don't have this facility
  c) allow to detach a virtual terminal and move it onto another
 physical terminal without closing the session
  d) provide a high-quality emulation of a VT100 terminal on
 a low-quality termcap terminal
  e) perform character encoding conversion and/or transliteration
 (luit does that part as well)

and I guess, bidi things could be added to it as well.

I got my first degree at the University of Erlangen, where lots of
terminal users at the CS department used screen (because
VT100-compatible terminals were available ubiquitously in 1990, but good
high-res monitors for X11 were still relatively rare in undergraduate
teaching rooms). I am repeatedly surprised how little this marvelous
terminal emulator tool is known elsewhere today.

ftp://ftp.uni-erlangen.de/pub/utilities/screen/
ftp://ftp.uni-erlangen.de/pub/utilities/screen/private/screen-3.9.9beta1.tar.gz

The advantage of the filter approach is that it works with pretty much
any terminal emulator, not only with xterm. You can also decide in
remote communication scenarios, whether to install the filter on the
host or on the terminal emulator machine.

If you want to play around with the idea, look at either screen or
luit as freely available starting points.

http://www.pps.jussieu.fr/~jch/software/luit/

 What you cannot automate, however, is the people reading those texts. It
 is also impossible to reprint all the existing texts (consider that the
 bible will have to written in original hebrew, for instance). As I said:
 you are not the first to suggest this.

It's certainly difficult, but not impossible. Turkey is the major
successful case I know of, where a script reform has succeeded. It was
driven by a major political move to make the country overall more
secular and compatible with Europe. German's abandoning of fraktur
probabaly doesn't count, as that's really just a different font style,
not a fundamentally differently structed alphabet or reading direction.
Does anyone know of any other examples or successful major script
reforms (apart from the semi-successful Soviet attempts to force all
their republics to switch to cyrillic)?

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n]Re: XLFD subsetting and bdftruncate

2002-01-22 Thread Markus Kuhn


Moe Elzubeir wrote on 2002-01-22 17:08 UTC:
 The subsetting system is in place already, so now what?

I still have not fully understood, what exactly is in place
and how well does it work in X11R6. For example

  xfd -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-*-75-75-C-*-ISO10646-1[0_0xff]'

works (and returns just the Latin-1 part of the Unicode font) but then

  xfd -fn '-Misc-Fixed-Medium-R-SemiCondensed--13-120-75-75-C-60-ISO10646-1[0_0xff]'

does *NOT* work and returns (when no bdftruncate is used) the full 700
kilobyte large XFontStruct that we fear so much.

What is going on here? I suspect nothing has actually changed since I
did my tests a few years ago, it is just that Juliusz's example XLFDs
contained wildcards at the right place, whereas I always used the full
XLFD as it stands in the BDF file. What difference does that make
in the font mechanics of the X server?

This is getting stranger and stranger and before we start to rely on the
subsetting, I strongly suggest that someone looks into what exactly of
it works to properly get it documented first. Or eliminate what might
just be a bug, namely that subsetting only works with enough wildcards.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]Re: [Devel] Re: [Fonts]Another approach to text in X

2002-01-20 Thread Markus Kuhn


Alexander Gelfenbain wrote on 2002-01-17 19:59 UTC:
 I can confirm that the license ST will be released with is BSD+ which is
 standard BSD with the following clause:

   * You acknowledge that this software is not designed, licensed or indended
   * for use in the design, construction, operation or maintenance of any
   * nuclear facility.

Just curious: Was this a legal or political requirement?

I'm not sure, high energy physics on the other side of the street from
here will in practice be aware of such a strange restriction, once they
get this package via the next SuSE or Solaris update on their office
machines. They are in the profession of doing rather cruel things to
atomic nuclei and design and run facilities to do so.

 We are working on publicly releasing the docs and placing the source code on 
 sourceforge or some other public CVS server. Please stay tuned.

I'm very much looking forward to seeing it.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]U+3000 limit

2002-01-10 Thread Markus Kuhn


Tomohiro KUBOTA wrote on 2002-01-10 10:58 UTC:
 At Wed, 09 Jan 2002 18:07:03 +,
 Markus Kuhn wrote:
 
  .. unless an explicit subrange specification is present, such that
  people have to write
  
*-iso10646-1[0_0x]
  
  if they are sure that they want to have the full font.
  
  In other words, allow the specification of a default subrange for
  sparsely populated ISO10646-1 fonts (e.g., those with more than 90% of
  their characters below 0x3000).
 
 How such range limitation will be used?  By knowledged end-users
 who knows (s)he doesn't need U+3000 characters?  Or, automatically
 set based on locale?  Or, as a hard-coded default font by foolish
 software developers who assume computers are used only by U+3000
 people?

You could for example simply cut the Unicode character space into 16
intervals 0x0XXX, 0x1XXX, etc. and open each interval as soon as you
encounter a glyph from it. Many widget sets (e.g., Tk) open fonts only
when needed, and that extends naturally to font subranges.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n]Re: U+3000 limit

2002-01-08 Thread Markus Kuhn


Moe Elzubeir wrote on 2002-01-08 00:12 UTC:
 I have been looking into the U+3000 limit and how the 10x20 font is
 being truncated to save memory space.
 This 'truncation' of the 10x20 for 'optimization' is seriously hampering
 our efforts to bring Arabic support on platforms where XFree86 runs.

I have updated

  http://www.cl.cam.ac.uk/~mgk25/unicode.html#xfontstruct

to tell the full story on this subject.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]U+3000 limit

2002-01-08 Thread Markus Kuhn


Juliusz Chroboczek wrote on 2002-01-08 14:16 UTC:
 Font subsetting is fully implemented in the BDF, PCF, Type 1, Speedo
 and freetype backends.  I haven't checked the SNF or X-TT backends.
 
 Try
 
   xfd -fn '-misc-fixed-medium-r-semicondensed--13-*-75-75-c-*-iso8859-1[65_90]'

Very nice, I hadn't seen that! Works fine for my XFree86 4.0.3
installation here. Since which release exactly did this work?

So I think, we can now drop bdftruncate from the ucs-fonts installation
procedure, as people merely have to add [0_0x31ff] to an XLFD to achieve
the same effect.

Any opinions?

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n]Re: RENDER performance

2001-12-29 Thread Markus Kuhn


Keith Packard wrote on 2001-12-28 19:54 UTC:
 I should have monochrome text running in a week or so to give people a 
 chance to experiment with performance over links of various sorts.  When 
 I've done this in other environments, I've found performance to be 
 acceptable down to 2B ISDN speeds; others may have different opinions.

I assume that is with some contemporary pixel size r. Unless you use
some good compression technique, performance will be proportional to
r^{-2}. With some good textual image compression systems (PNG, G4FAX,
JBIG, etc.) used on the bitmaps, performance might become proportional
to around r^{-1.3}. Pixel sizes for color CRTs have stabilized now at
around 0.22-0.25 mm, as smaller aperture masks are not feasible. But who
knows what's coming next? Pixel sizes down to 0.05-0.10 mm, as we have
already with laser printers, would certainly be desireable for e-book
applications, etc.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

[I18n]Re: X and Supplementary Planes

2001-12-28 Thread Markus Kuhn


Roozbeh Pournader wrote on 2001-12-27 22:58 UTC:
 I remember the discussion here about the font naming and structure issues
 for non-BMP characters. But I cannot remember the outcome (and if there 
 were oppositions). Since we are thinking about doing some work on Pango to 
 support non-BMP characters, I wanted to ask for a briefing...

I am more and more convinced that if we are going to do anything of this
sort on the old XLFD front, then it should be the definition of a new
ISO10646-C encoding, which is a glyph encoding and which has in its
properties character/glyph mappings. This encoding would come together with
little and highly efficient C functions

  makeiso10646cglyphmap(XFontStruct *font, iso10646cglyphmap *map);

reads the character-to-glyph mapping table from the font
properties into a compact and efficient in-memory representation

  freeiso10646cglyphmap(iso10646cglyphmap *map);

frees that in-memory representation

  mbtoiso10646c(char *string, iso10646cglyphmap *map, XChar2b *output);
  wctoiso10646c(wchar_t *string, iso10646cglyphmap *map, XChar2b *output);

take a Unicode character string and convert it to a XChar2b glyph
string suitable for output by XDrawString16 with the ISO10646-C
from which the iso10646cglyphmap was extracted.

ISO10646-C fonts would still be limited to have not more than 64
kibiglyphs, but these can come from anywhere in UCS, not just from the
BMP. This solution also easily provides for glyph substitution, such
that we can finally handle the Indic fonts. It solves the
huge-XFontStruct problem of ISO10646-1, as XFontStruct grows now
proportionally with the number of glyphs, not with the highest
characters. It could also provide for simple overstriking combining
characters, but then the glyphs for combining characters would have to
be stored with negative width inside an ISO10646-C font. It can even
provide support for variable combining accent positions, by having
several alternative combining glyphs with accents at different heights
for the same combining character, and the ligature substitution tables
would encode, which combining glyph to use with which base character.

Looks all very easily doable to me. Someone would just have to sit down
and write a proper spec for ISO10646-C fonts plus the above mentioned
client-side Unicode - ISO10646-C character-glyph-conversion routine,
and then we can start producing fonts and tools.

Unfortunately, I don't have the time to start this in the foreseeable
future. Any volunteers interested in writing a first draft (preferably
in the same troff format in which the other X spects are already
written)?

Markus

P.S.: The C in ISO10646-C stands for combining, complex, compact, or
character-glyph mapped, as you prefer. (And please don't start again
with the fuzz about that C is not a part number of an ISO standard.)
For those who think that all this is obsolete, remember that it is the
only solution proposed so far that makes efficient complex script
rendering available to legacy X terminals with immutable ROM servers and
no RENDER or ST extension.

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]US-ASCII part of CJK TTFs served by freetype and xtt backends

2001-10-25 Thread Markus Kuhn


On Wed, 24 Oct 2001, Jungshik Shin wrote:
 JC If you desire a different behaviour, you should either try to get your
 JC applications to work with `-p-' fonts, or push for a ``biwidth'' `-b-'
 JC spacing type to be included in a future versions of the XLFD.

  If the answer to my two questions above is no, I don't think '-b-'
 is necessary. '-b-' is only necessary for 'iso10646-1' fonts derived
 from CJK TTFs, but I'm not talking about them (for them, I suggested
 'subsetting' as a way around in my previous mesg.). Of course, if the
 answer is yes, my point is moot.

We discussed biwidth (-b-) fonts ages ago, but since then I though it had
been agreed that splitting bi-width fonts up into two charcell (-c-) XLFDs
is actually better, because leaves the application to decide, which width
to use for which character, especially as CJK and Western habits differ
here significantly.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

Re: [I18n]xterm-158, XIM and UTF-8

2001-09-14 Thread Markus Kuhn


On Wed, 12 Sep 2001, Steve Swales wrote:
 XIM is working very well with the CSI  (code-set independent) version of xterm
 provided by Li18nux.org (patches from IBM).  It works equally well in our UTF-8
 locales and our non-UTF-8 locales.  Because of this (CSI), Sun will be adopting
 this version of xterm, rather than the utf-8 hardwired one, for a future release
 of Solaris.  We are working with the patch developers to enhance and extend this
 implementation to make it a fully functional, code-set independent,
 internationalized terminal emulator.  We will, obviously, be providing these
 enhancements back to the community, and Sun will be promoting this xterm at
 X.Org as well.

I'm looking forward to see these enhancements to the X.Org xterm.

Like Juliusz, I hope that Sun is aware that the current CSI API provides a
functionality significantly more restricted to what our existing hardwired
xterm offers already. I hope Sun is fully aware of the significant
extensions that have to be made to the CSI concept (which was originally
purely developed with the requirements of ISO 8859 and CJK legacy
encodings in mind) in order to cover the additional specific functionality
required for proper Unicode support.

Before XFree86 considers abandoning its current xterm UCS extensions,
I'd hope that Sun's CSI equivalent will feature equivalent functionality,
for example:

  - Support of at least up to two overstriking combining characters
as they are essential for support of the Thai, Laos and other
scripts.

  - Selection of glyphs from the single-width and double-width font
based on either libc wcwidth() or the current XFree86 convention
documented in

  http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

  - Support for UTF8_STRING selections independent of the current locale,
in order to facilitate the simple and effective exchange of data
between applications running in different locales or the many
important already existing Unicode-only applications.

I also recommend a careful comparison of the currently used rather buggy
and incomplete X.Org keysym-Unicode table with the more up-to-date
XFree86 table on

  http://www.cl.cam.ac.uk/~mgk25/ucs/keysym2ucs.c

If you are working on a XIM for UTF-8 locales, I'd also like to draw your
attention to

  ISO/IEC 14755
  Information Technology -- Input methods to enter characters from
  the repertoire of ISO/IEC 10646 with a keyboard or other input
  devices
  http://www.cl.cam.ac.uk/~mgk25/volatile/ISO-14755.pdf

which describes a number of basic universal character entry methods
that would be extremely useful to have integrated into XIM, such that
independent of the current locale, characters can always also be entered
into any application via their Unicode hex value.

Our experience has been that the interactions of UTF-8 and the VT100
semantics offered by xterm can be very tricky and I'd like to encourage
you to make development snapshorts of your xterm release available for
alpha testing by Li18nux, XFree86, and text mode editor developers early
and often, such that we can provide thorough debugging and feedback for
the implementation long before it sets a standard by becoming part of an
X.Org release.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: http://www.cl.cam.ac.uk/~mgk25/

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n

41 matches

Mail list logo