Re: Unicode Keyboard Input Linux

Elvis Presley Mon, 14 Jun 2004 08:45:37 -0700

Unicode Keyboard Input Linux

Hello,


This attached diagram represents my (naive)
understanding of terminal IO under Linux/Unix.

Elvis

PS

The real console is essentially a graphical device,
with screen(=display), keyboard and mouse, and
whatever else might be considered interesting...
Applications do not open the real console directly,
but in theory, they could --in DOS they could: the
interface could be made public; there would have to be
a device special file for the real console, and the
virtual consoles too, and the pseudo terminals... Have
I forgotten anything?

The state of the VC mux is controlled by Alt-Func
keys. The VC mux sends all console IO to the "current"
console (except the Alt-Func keys): you wouldn't need
to run a VC mux in a vc.

A virtual console(vc) holds the state of the each
unicode terminal: 1) the display contents, 2) the
keyboard map, 3) the mouse position (yes, because the
vc mux does not do overlapping windows, so mouse
position would be independent of each vc), therefore
the keymap would be part of the virtual console, not
the tty driver (as I thought), so you could change
keyboards using any Alt-key combination (but Alt-Func
keys are already used by the vc mux).

You could not really use keymaps in a traditional tty
configuration anyway, because the ascii terminal can't
display unicode characters, unless you ran a unicode
emulator on a PC, then connected to a shell through a
traditional Start/Stop interface. Each vc is a unicode
terminal emulator. They understand utf-8, they can
display utf-8 and they can generate utf-8.

You could put a keymap module in the stream which
translated ascii into utf-8 unicode, but why bother?

Of course, the tty module still must understand
unicode. I don't think this is a big problem, beacuse
the basic repetoire remains the same (=ascii) thanks
to the utf-8 encoding, but I'm sure there a few hidden
traps.

The pseudo-terminal driver (i.e. module) has got to be
a pretty simple device: it just copies everything it
sees to the (traditional) tty driver(=line
discipline). In fact, the VC mux could contain each
VC, then you'd have a big, multiplexed
pseudo-terminal.

Anything (module or program) which opens the master
side of a pseudo-terminal is called a terminal
emulator, therefore a 'vc' and an 'xterm' perform the
same function, but in different spaces. I wonder how
much of the software can be reused. You need vc's in
the kernel in the absence of X, to support Linux
virtual terminals. It you run X, you don't need vc's.

Therefore, the remote telnet is a terminal emulator
too. It connects to a pseudo-terminal through its tcp
conncetion. Kermit, too, would be a terminal emulator,
even though both, as application programs, might be
running in xterms on the remote computers. Kermit has
the interesting property that it can download files
over its "session" connection (unlike ftp) which means
it would work through the firewall: if you can connect
to a telnet server using kermit, you can surely
download files, by changing the mode of the emulator,
a pretty nice feature. This can get pretty confusing.

Is the ftp client a terminal emulator? I think so,
because it's control connection is going to be made to
a pseudo terminal, but ftp doesn't use getty to check
the userid, does it? 

A virtual console(=VC) has a set of abstract qualities
which closely resemble the real console. If there were
two types of virtual console, graphical and
character-mode, then the real console could be shared
with X which opens an instance of a graphical virtual
console. There could be more than one instance of X
running, why not? Otherwise, you'd have to choose
between vterms and xterms. They both do the same thing
anyway.

Can I use xterm to connect to a remote X server? If I
could, the X server would have to validate the user's
identity, much like telnet does, using the /etc/passwd
file. In the local scenario, xterm starts up shells
automatically. In X I've got to use some
replacement-for-getty/login program.

Unicode C/C++ Application Programming

"wchar_t is a 32-bit wide character."

1) Does zero(=0x0) still represent end-of-string?

2) Does -1(=0xFFFFFFFF) still represent end-of-stream?

Comparing characters would be easy, they compare as
unsigned integers, but sorting them would be a
problem, because you'd want to group all the
(accented) vowels together, according to language
specific rules. In Greek, this wouldn't be a problem,
because monotonic vowels and polytonic vowels, though
occupying different code ranges, are not mixed in the
same word: they are essentially different languages. A
'tonos' is not a 'oxia' or a 'varia'.

The editor 'vi' would have to be modified to get/put
wcar_t, so I don't understand why you'd need a
separate unicode editor, or separate unicode
application, whatever it might be.

1) Does 'sort' work on utf-8 input?

2) Does 'grep' (Unix search) work on utf-8 input?

3) Is there a laundry list or Unix filters which need
to be changed to support Internationalization? I know
'cat' doesn't.

Now, I can create a unicode HTML file using vi. You
see, you don't need alot of support to do useful work.

Why do Greek newspapers still use ISO 8859-7?

Since utf-8 doubles the size of a file, it looks like
these older character sets will be around for a long
time. That's no problem if you're working in Greek and
English. Unicode lets you encode an entire document
without tagging bits and pieces by the char-set they
use (and don't forget to switch keymaps too, a
potential for error -- and your text editor would have
to support each 8-bit character set too, a real
nightmare), but if you're only working in Greek, why
not stick with what you know?

My Microsoft browser(=IE) has problems with ISO Greek
and Windows Greek, especially capital Alpha with
tonos: it gets confused, and displays a box.

Unicode is a much nicer solution, except it's
prejudiced against non-english speakers. All tags are
ascii, but the content can be anything, just switch
keymaps, no need to tag the content again. However,
double the size of the file and you double the
download time too. Now you need a server twice as big.

Posix Locales

It looks to me like the most important distinction
between locales is not language, but national currency
symbol. Many locales use the same language, but each
may have different currency symbols (especially in
Latin America), a mere trifal.

1) What are utf-8 locales? I would have thought that
utf-8 would be applicable across all locales.

Hypothesis: There could be an iso 8859 locale and a
unicode locale for the same "region" for historical
reasons. This is causing the confusion. I've never
worked in Latin-1, or Latin-2, just ascii and unicode,
and I don't even want to think about using a different
copy of the same program for each.

Conclusion

As you can see, there is alot of
duplicated(=redundant, better: "overlapping")
functionality(=technology) out there. I don't know how
to trim the tree.

Now, what about left->right and double-column
characters? If you restrict your efforts to the vc
module, you'd have to change xterm too, and do it all
over again. I don't know anything about these
languages --I have never read a Japaneese newspaper--
still, it doesn't look to complicated to me.

Regards,

Joe

PS

Can I run a copy of X windows in an xterm?

Is there a version of X which runs as a Microsoft
Window (without the Linux/cygwin)? xterm could start
up a DSO session.

Is there a version of Linux which runs as a Microsoft
Window (not cygwin)?




        
                
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

<<inline: virtual console diagram gif.GIF>>

Re: Unicode Keyboard Input Linux

Reply via email to