"M.P.N. Peters" wrote on 2002-09-02 12:10 UTC:
> Recently I found out about Unicode and UTF-8. Unfortunately, it raise
> s a lot of questions. My first question is, how can I, with a limited
> (= qwerty) keyboard that can generate only about 100 scancodes (I
> think), produce all the keycodes needed to reach for example the phon-
> etic characters? In other words, I think the question is how a single
> scancode can have multiple keycodes?

Which input method is most appropriate for Unicode depends a lot on
your application and what character you use:

  - If you are typing text routinely, you definitely want to have
    a keyboard layout suited well for the script that you use.

  - Traditional keyboard layouts have been designed with the restrictions
    of typewriters in ming and to not have dedicated keys for all
    symbols that you migth want to use in modern word processing.
    Some of the solution include:

      - Add additional layers to the keyboard, such as an AltGr
        key, or a special "compose key" that initialtes entering
        a character by typing in an ASCII mnemonic (like compose + C + O
        to get the copyright sign).

      - Add to editors context sensitive automatic replacement
        mechanisms that substitutes the characters you entered
        with the ones you presumably meant to enter (like the
        "smartquotes" algorithms used by some word processors).
        Some people, including myself, find this approach slightly
        dangerous.

      - For more rarely required symbols (e.g., mathematical notation,
        for many people typically also phonetic alphabet), it might be
        a sufficient entry method to chose these with a mouseclick from
        an on-screen menue. Xterm allows you to do this already today
        via the cut&paste mechanism. Just keep a short file that contains
        neatly arranged the Unicode characters that you need to enter most
        frequently in your work, and cut&paste from there. That's the
        technique I find myself using most frequently.

      - Have in the keyboard driver a key combination that initiates
        hexadecimal entry of a Unicode character, as a fallback mechanism
        for expert users

  - Sooner or later, we will have to think about revising the current
    national keyboard standards. For example, the English keyboards lack
    today many very widely used characters (especially all the ones for
    which e.g. M$-Word offers keyboard shortcuts), such as directional
    quotation marks as well as all the different dashes and hyphens
    distinguished in proper typography. A future keyboard standard is
    also likely to have dedicated keys for a number of combining accents,
    whereas keys on European keyboards for precomposed latin letters
    are likely to vanish, leading to more language-independent keyboard
    standards. But that's still a bit away today.

On your Linux box, I recommend that you make full use of the AltGr
key today and put with xmodmap all the characters on the keyboard that
you might need. For example:

  keysym e = e NoSymbol EuroSign   NoSymbol
  keysym g = g NoSymbol sterling   NoSymbol
  keysym m = m NoSymbol mu         NoSymbol
  keysym d = d NoSymbol degree     NoSymbol
  keysym space = space NoSymbol nobreakspace NoSymbol

For Unicode characters for which there doesn't exist a historic X11
keysym, use by convention 0x0100XXYY, where XXYY is the Unicode value of
the character. Xterm and many other X11 applications understand that
convention.

http://www.cl.cam.ac.uk/~mgk25/unicode.html

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to