On Fri, Sep 01, 2006 at 04:32:40PM +1000, George W Gerrity wrote:
> I did try to tell you that doing a terminal emulation properly would
> be complex. I don't know if the algorithm is broken: I doubt it. But
> it is difficult getting it to work properly and it essentially
> requires internal tables for every glyph describing its direction and
> orientation.
If that were the problem it would be trivial. The problems are much
more fundamental. The key examples you should look at are things like:
printf("%s %d %d %s\n", string1, number2, number3, string4); where the
output is intended to be columnar. Everything is fine until someone
puts in data where string1 ends in RTL text and string4 begins with
RTL text, in which case the numbers switch places. This kind of
instability is not just awkward; it shows that implicit bidi is
fundamentally broken. Even if it can be handled at the terminal
emulator level with special escapes and whatnot (and I believe it can,
albeit in very ugly ways) it simply cannot be handled in a plain text
file, for reasons like:
columna COLUMNB 1234 5678 columnc
columna COLUMNB 1234 5678 COLUMNC
Implicit bidi requires interpreting a flow of plain text as
sentence/paragraph content which is simply not a reasonable
assumption. Consider also what would happen if your text file is two
preformatted 32-character-wide paragraph columns side-by-side. Now
imagine the kind of havok that could result if this sort of insanity
took place in the presentation of configuration files with critical
security settings, for instance where the strings are usernames (which
MUST be able to contain any letter character from any language) and
the numbers are permission levels. And certainly you can't just throw
explicit direction markers into a config file like that because they'd
alter the semantics (which should be purely byte-oriented; there's no
reason any program not displaying text should include code to process
the contents).
One of the unacceptable things that the Unicode consortium has done
(as opposed to ISO 10646 which, after their initial debacle, has been
quite reasonable and conservative in what they specify) is to presume
they can redefine what a text file is. This has included BOMs,
paragraph break character, implicit(?) deprecation of newline
character as a line/paragraph break, etc. Notice that all of these
redefinitions have been universally rejected by *NIX users because
they are incompatible with the *NIX notion of a text file. My view is
that implicit bidi is equally incompatible with text files and should
be rejected for the same reasons.
This does not mean that storing text in 'visual order' is acceptable
either; that's just disgusting and makes correct ligatures/shaping
impossible. It just means that you cannot create a bidirection
presentation from a text file without higher level markup. Instead you
can use a vertical presentation or either LTR or RTL presentation with
the opposite-directionality glyphs rotated 180°.
My observations were that this sort of presentation is much easier to
edit and quite possibly easier to read than a format where your eyes
have to switch scanning directions.
I'm not unwilling to support implicit bidi if somebody else wants to
code it, but the output WILL BE WRONG in many cases and thus will be
off by default. The data needed to do it correctly is simply not
there.
> > [...]
> >[1] There is a small problem that even without LTR scripts mixed in,
> >most RTL scripts are "bidirectional" due to numbers being written LTR.
> >However supporting reversed display of individual numbers (or even
> >individual words) is a trivial problem compared to full bidi text flow
> >and can be done without compromising reversibility and without complex
> >algorithms that cause misinterpretation of adjacent text.
>
> No one using arabic script would accept reading it top to bottom: it
> is simply never done (to the best of my knowledge), and so any
> terminal emulator claiming to work with any script had better be able
> to render the text correctly, including mixing rtl and ltr.
You misread the above. Of course no one using LTR scripts would want
to read top-to-bottom either. The intent is that users of RTL scripts
could use an _entirely_ RTL terminal with the LTR characters' glyphs
rotated 180° while LTR users could use an _entirely_ LTR terminal with
RTL glyphs rotated 180°. The exception noted in the footnote is that
RTL scripts actually require "bidi" for numbers, but I comment that
this is trivial compared to bidi and suffers from none of the
fundamental problems of bidi.
The vertical orientation thing is mostly of interest to Mongolian
users and perhaps some East Asian users, but it could also be
interesting to (a very few) users of both LTR and RTL scripts who use
both frequently and who want a more equal treatment of both,
especially if they find reading upside-down difficult.
Rich
P.S. Do you have any good screenshots with RTL or LTR embedded text?
If so I can prepare some modified images to show what I mean and you
can see what you think of readability.
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/