RE: Linux console internationalization

Beni Cherniavsky Thu, 14 Aug 2003 14:54:46 -0700

Kent Karlsson wrote on 2003-08-12:

> By and large, "decomposed" vs. "precomposed" only makes sense
> for Latin, Greek, Cyrillic, Hangul, and Hiragana/Katakana.
> There are precomposed letters also for other scripts, but mostly they
> are not to be recommended.
>
Hebrew also has quite a lot precomposed characters and they are not
recommended either (in fact they are very annoying because xterm
greedily composes them but most fonts lack them, so you get empty
boxes for a lot consonant + vowel combinations but not for others...).


> Why do something entirely different for the "console". Why not adapt
> XKB so that it, and its data files, can work for the "console" too?
> (Likewise for an input method mechanism (XIM??).)
>
XKB might or might not be a good choice.  What's sure is that the
world needs less keymap formats.  Here are open-source systems and
applications I can now remember, each having a keymap format:

- X: xmodmap and XKB
- Linux console
- Emacs (leim)
- VIM
- Lyx
- TeXmacs
- Yudit
- mined
- Geresh
- Allegro (gaming library)

There are dozens more, I just don't use them and/or don't remember
them now.  And quite obviously, most of them just duplicate each
other's work for creating the various keymaps.  Only the first three
have comprehensive databases AFAIK.

The questions are:

1. Why should anything except the low-level X/console bother
   with keymaps at all?

2. Why can no two things in the world use a common format???

The first question has some reasonable answers:

- Applications have more knowledge about the appropriateness of
  remapping than the dumb underlying system:

  - Emacs only remaps keys bound to `self-insert-command`, leaving the
    keys with modifiers availiable at all times.

    - This could be approximated at a low level by not remapping keys
      with modifies and I'm gonna do precisely this when I learn XKB).
      I doesn't make much sense to send e.g. Control-Meta-<aleph>,
      I've never seen an application that binds something to it ;-).
      It would still not help with multi-key character sequences that
      contain plain characters (e.g. `C-h k`).

  - Emacs maintains input method state per buffer.  This is not
    critical but handy, especially when switching to the minibuffer to
    e.g. type a filename to open.

  - VIM remaps only in insertion mode.  In command or ':' mode it
    disables it.  Remapping at a low level would be particularly
    painful in VI, since commans are simple characters...

  Conclusion: to eliminate this reason, we would have at a minimum to
  define a protocol (escape sequences) allowing the application to
  turn the keymapping on and off.  This would break over high-latency
  connections and would not address the situations where the
  application doesn't know whether to map or not until it sees the
  key.  So the best solution would be a format for transmitting keys
  to applications that contains the keys both before and after the
  translation.

- Applications can implement smarter input methods that are too
  complex for inclusion in low-level mechanisms.  Emacs' leim ranges
  from simple 1:1 tables to complex Asian input methods using the full
  powers of Emacs Lisp.

  This reason is properly addressed by defining input method protocols
  that allow one user-space program to service the whole system.  This
  is not so easy because complex input methods need access to the
  current contents of the buffer (requiring cooperation of the
  application!) and need screen real estate to interact with the user
  (which again requires cooperation to place it well and more
  importantly, makes the input methods dependant on system-specific
  APIs do curry the interaction).

  - XIM, I'm afraid, can't work in the console, it's too X-centric
    XIM servers negotiate with applications and draw to the screen
    with X requests - and we don't want to implement a whole X server
    as part of the console ;-).  [@@@ Am I talking nonsense here?]

  - There are newer protocols that are intended to be portable,
    promising eventually to service Windows, OS X, X and the console
    from the same server source.  I don't know more.

- Many of these applications are portable to many systems and don't
  want keyboard input to be their system-dependant weak point.

As for the second question, there are no answers.  It's just stupid.
Sorry.  All the simple table-based apps could trivially share a
format.  The more complex ones could try to.

Of the above formats, only XKB, console and allegro have an
understanding of low-level keyboard layout (all others start with what
the OS gives them) and only XKB is targetted at an output rich enough
to represent Shift-Control-Alt-Cokebottle.

Now, *please*, let's forget the old curses model of keys, limited to
Meta + ASCII + the function keys of some old (although big) terminal
and shifted variants of *some* of them.  Let's define something that
would allow the console to describe Control-Shift-Meta-Left to emacs.
I guess this would involve modifiers translated to prefixes (like
already done for Mete->ESC), textual function key names (so we never
run out of them) and some mechanisms to communicate both the raw
sequences and the Unicode character content.

So back to XKB.  It's powerful enough to handle almost any need.  It's
formats are not ideal though.  It has some X cruft, like limitation to
4 groups, resolution to X keysyms instead of Unicode, and general
complexity, like custom names for all physical key names.  The Linux
console and Allegro also suffer from this disease.  Tell me, why do I
have to remember that the key is named `<TLDE>` in XKB and the
resulting value is name `asciitilde` in the linux console, when I
could have written ``~`` for both?

I think that most key mapping tasks can be done simply as a sequence
of mappings on unicode strings, applied one after the other.  So the
basic mapping from scancodes would uses the well-known qwerty names,
like ``q``, and following layers would translate it to a non-qwerty
layout (if needed).  This way, the amount of arbitrary names in the
system is minimized, easying re-use in other environments.

Modifiers and function keys would need moer complex handling but that
too can be mastered in simpler ways than XKB, I believe.  I troubled
by the current trend of making XKB keymaps a thing for wizards only
and wrapping them with monstrous XML files driving applications that
allow you to simply select from the existing options...

-- 
Beni Cherniavsky <[EMAIL PROTECTED]>
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: Linux console internationalization

Reply via email to