"M.P.N. Peters" wrote on 2002-09-02 12:10 UTC:
> Recently I found out about Unicode and UTF-8. Unfortunately, it raise
> s a lot of questions. My first question is, how can I, with a limited
> (= qwerty) keyboard that can generate only about 100 scancodes (I
> think), produce all the keycodes needed to reach for example the phon-
> etic characters? In other words, I think the question is how a single
> scancode can have multiple keycodes?
Which input method is most appropriate for Unicode depends a lot on
your application and what character you use:
- If you are typing text routinely, you definitely want to have
a keyboard layout suited well for the script that you use.
- Traditional keyboard layouts have been designed with the restrictions
of typewriters in ming and to not have dedicated keys for all
symbols that you migth want to use in modern word processing.
Some of the solution include:
- Add additional layers to the keyboard, such as an AltGr
key, or a special "compose key" that initialtes entering
a character by typing in an ASCII mnemonic (like compose + C + O
to get the copyright sign).
- Add to editors context sensitive automatic replacement
mechanisms that substitutes the characters you entered
with the ones you presumably meant to enter (like the
"smartquotes" algorithms used by some word processors).
Some people, including myself, find this approach slightly
dangerous.
- For more rarely required symbols (e.g., mathematical notation,
for many people typically also phonetic alphabet), it might be
a sufficient entry method to chose these with a mouseclick from
an on-screen menue. Xterm allows you to do this already today
via the cut&paste mechanism. Just keep a short file that contains
neatly arranged the Unicode characters that you need to enter most
frequently in your work, and cut&paste from there. That's the
technique I find myself using most frequently.
- Have in the keyboard driver a key combination that initiates
hexadecimal entry of a Unicode character, as a fallback mechanism
for expert users
- Sooner or later, we will have to think about revising the current
national keyboard standards. For example, the English keyboards lack
today many very widely used characters (especially all the ones for
which e.g. M$-Word offers keyboard shortcuts), such as directional
quotation marks as well as all the different dashes and hyphens
distinguished in proper typography. A future keyboard standard is
also likely to have dedicated keys for a number of combining accents,
whereas keys on European keyboards for precomposed latin letters
are likely to vanish, leading to more language-independent keyboard
standards. But that's still a bit away today.
On your Linux box, I recommend that you make full use of the AltGr
key today and put with xmodmap all the characters on the keyboard that
you might need. For example:
keysym e = e NoSymbol EuroSign NoSymbol
keysym g = g NoSymbol sterling NoSymbol
keysym m = m NoSymbol mu NoSymbol
keysym d = d NoSymbol degree NoSymbol
keysym space = space NoSymbol nobreakspace NoSymbol
For Unicode characters for which there doesn't exist a historic X11
keysym, use by convention 0x0100XXYY, where XXYY is the Unicode value of
the character. Xterm and many other X11 applications understand that
convention.
http://www.cl.cam.ac.uk/~mgk25/unicode.html
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/