Re: Linux Console in UTF-8 - current state

Edward H Trager Thu, 03 Oct 2002 12:26:39 -0700


Hi, everyone,


> In the case of CJK and basic display, 18x18 is extravagant--16x16 is
> sufficient for daily purposes, and in extreme cases, 12x12 or 10x10 could
> even be proposed, but they are very very illegible.  Further trimmings in
> size could be arrived at by reducing the repetoire down to just a
> combination of the major commonly used legacy character sets, GB2312,
> Big5, JIS X 0208, and KS X 1001 (or we can wait and see what conclusions
> the IRG is coming up with for a minimal set).

I've been following this thread, as well as other threads on the
Unicode.org list, and I'm trying to think "outside the box" here.
So, please read this (I know it's long) and consider the *PROS* and *CONS*
of the following:

WHY NOT CREATE A COMPREHENSIVE, SCALEABLE, STROKE-BASED UNICODE FONT?
---------------------------------------------------------------------

This would require inventing an *Open Standard* for a stroke-based font
format.  The open standard would be designed so that it doesn't have the
64K glyph limit, as that might be a problem for comprehensive Unicode
coverage, especially if you want to include culturally-appropriate CJVK
glyph variants.  Of course, you would also need a rasterizer for scaling
the font(s) -- but it would be easier and simpler to build than any TTF or
OpenType rasterizer -- just ask the FreeType2 experts with all of their
autohinting expertise to help out here!

Comprehensive stroke-based Unicode fonts would also be a lot *easier* and
*faster* to create than any TTF font because you just have to worry about
the path needed to draw a glyph, not the outline.

Proprietary stroke-based fonts for CJVK subset of Unicode exist from Agfa
Monotype and Bitstream for use in the embedded and TV/broadcasting market,
so people have done this already, at least for CJVK. (And probably some of
the commercial handheld devices that will be using such proprietary
technology are going to be running linux kernels ...).

What's missing is an Open Source implementation that would create an Open
Standard for stroke-based fonts.  Another thing missing is that the
proprietary stroke-based solutions are only for the CJVK subset: I propose
extending that to cover the whole (well, at least Plane 1) of Unicode.

The proprietary stroke-based fonts take advantage of the obvious trick of
storing CJK radicals and other repeated character elements just once, so
an individual Hanzi or Kanji glyph is composed from several elements,
scaled appropriately.  An Open Source implementation would naturally want
to do the same.

The proprietary stroke-based fonts, to my knowledge, only allow
unmodulated, unserifed fonts: "normal" vs. "bold".  That's more than
enough for any console.  But I would propose considering to extend this to
include at least the kind of stroke modulation that you get when you use a
wide, flat calligraphy nib (i.e., when doing calligraphy by hand on
paper).  This makes your Arabic and other scripts look better, and perhaps
increases legibility for such scripts.  So the font format would allow for
using different "pens" for rendering different scripts: normal, bold,
wide-flat-at-45-degree-nib, etc.

While some might consider this as nonsense that won't work for old
VT100/VT220 terminals (or emulators) which require bitmapped fonts, it
seems to me that this approach would be the most elegant solution for
utf8-enabled xterm.  Isn't this internationalized Linux console going to
be on a graphics framebuffer anyway?

First, the font file(s) will be very small, smaller than what you can
achieve with bitmap or TTF/OpenType fonts.  You can even have
culturally-appropriate CJK glyph variants stuffed into a single
stroke-based font file and still come out with very small font file sizes
(see the Agfa Monotype or Bitstream web pages or white papers for
example file sizes).

Secondly, and IMHO most importantly, you can, ONCE-AND-FOR-ALL, solve the
common problem of having missing glyphs show up as open squares (or worse
...) because you can make the stroke-based font cover the whole range.
Suppose for moment that FreeFont2 were to support such a font format. When
an application like xterm requests FreeType2 to render a string, FreeType
could use the requested TTF font, only substituting the stroke-based
glyphs when needed.  So now everything is legible, at a slight cost of
possibly having to look at an alternate font design for the TTF-missing
glyphs.  Alternatively, an app can just request to use the nice,
clean-looking stroke-based font all the time.  It would be like using GNU
Unifont in your xterm, except it's not bitmapped.  And presumably
FreeType2 will have, or acquire, the smarts for rendering the Arabic and
Indic scripts properly.  For the linux console on the framebuffer, it
seems to me you would always want to use the stroke-based fonts.

********

BIDI AND VERTICAL SCRIPTS ON THE CONSOLE
----------------------------------

Another issue that's been tossed about in this thread is the issue of BIDI
and vertical rendering.  IMHO, I agree with others who have suggested that
a vertical top-to-bottom console is not necessary.  It would complicate
things to much.

Probably the Mongolians arrived at a horizontal rendering method for their
script after the introduction of computer technology or western/Cyrillic
printing technologies.  Historically, the Chinese were also accustomed to
print from top to bottom, and columns went from right to left.  But in the
modern period, it has become quite normal to write and print documents
horizontally left-to-right.  The traditional layout hasn't perished --
it's still quite common in newspapers, at least in Taiwan and HK (I think
Mainland China has settled on the horizontal LTR format more so than in
Taiwan and HK).  And as far as I can tell, no one in China, Taiwan, or
Hong Kong is any the worse off for writing LTR or having to deal with
mixed layouts on a daily basis.  So, probably this is the right answer for
the Mongolians too ... but of course, this should be confirmed with some
native Mongolian speakers.


On the other hand, BIDI should be supported.

In answer to Markus Khun and others who have asked in this thread how you
would have, for example, a RTL console for Hebrew or Arabic, I think the
answer is:

  1. Assume that the default behaviour of the console is as a
     horizontal-only, Left-To-Right console just as it is now.

  2. Add the ability to toggle between LTR and RTL typing to support
     RTL scripts like Arabic and Hebrew.  This should work something like
     it does in Gaspar Sinai's Yudit editor where you can select different
     input key maps and where you can select RTL vs. LTR typing (the
     only gripe I have with Yudit (v.2.5.2) is that the program doesn't
     automatically switch to RTL when I choose Arabic or Hebrew, which it
     should do). In Yudit, the characters are rendered RTL and all of the
     ligature shapes (for Arabic, Indic langs ...) appear automagically as
     you are typing.

  3. LISTING DIRECTORIES AND FILES:  Individual file and directory names
     should be rendered RTL for RTL scripts, but the tree hierarchy should
     still be displayed LTR, just as it is now.  Using "<--" to represent
     an RTL script, and "-->" to represent a LTR script, a directory tree
     would look like this:

        /
        /home
        /home/<--RTLdirname--/
        /home/<--RTLdirname--/<--RTLsubdir1name--/
        /home/<--RTLdirname--/<--RTLsubdir2name--/
        /home/<--RTLdirname--/--LTRsubdir3name-->/
        /home/<--RTLdirname--/--LTRsubdir3name-->/<--RTLsubsubdir1name--/
        ...

     This way, you show a directory tree of any complexity with any
     mixture of UTF-8 file names in any language.

     I asked a colleague of mine who is a native speaker of Persian (which
     is written using Arabic script) what she thought about this kind of
     convention: She  thought this would be a reasonable solution for showing a
     hierarchical tree structure.  Note that this colleague
     has absolutely NO familiarity with UNIX or Linux *at all*, so
     her opinion cannot have been conditioned by prior familiarity with a UNIX
     command line interface (I used Yudit and the example of classifying
     different kinds of "kabob" hierarchically to explain the idea).

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Linux Console in UTF-8 - current state

Reply via email to