Benjamin Riefenstahl wrote:
Hi Gregg,
Hi Benny,
Thanks for your reasoned reply. Comments below.
Gregg Reynolds writes:
1. It was legacy, so Unicode had so support it. Then they went
berserk with it.
From my POV, there are very good reasons to consistently encode
characters in the order in which they are written. You don't want
visual layout for any other operation except display. You might think
that display is the most important operation on text, but for large
bits of most software it isn't.
Two things. One is, directionality a design choice, not a reflection of
some kind of objective reality. This is obvious if you stare at some
RTL text and think for a while. However, the Unicode book claims that
RTL languages are "inherently" bidirectional. This is hogwash.
Second, "the order in which [characters] are written" is not relevant to
an encoding model. There is no necessary relationship between the IO
model implemented by an application and the corresponding textual
representation, which is application independent. Specifically, your
editor can support data entry of digit strings as either LSD-first or
MSD-first, or both. Neither data entry protocol has anything to do with
the way the data is encoded in persistent storage. For that matter, the
internal encoding of an editor is independent of the data exchange
formats it im/exports. Emacs being a great example of that.
In other words "reasons to consistently encode characters in the order
in which they are written" is essentially meaningless. (I say that as
a statement of fact, not as a flame.)
You might think that RTL without bidi would be enough. But once you
have RTL, it becomes the job of the Unicode standard to define how
mixed content is handled. Mixed content is after all the driving
force for Unicode in the first place. I also think that most users
Hmm. I think that's debatable. I think unification of diverse encoding
schemes is the primary driver behind Unicode, but that's a digression.
More important is that RTL has no necessary relationship to mixed
content or bidi reordering. If you only ever write documents in Arabic
(Hebrew, Persian, Pashto, whatever) then why do you need bidi? You
don't; it's an unfortunate artifact of Western-driven standardization.
To be clear: monolingual Arabic text is not mixed content, whether it
contains digit strings or not. So why should an Arabic user pay the
Unicode tax of bidi support?
Don't get me wrong, I'm not saying the bidi algorithm is not useful or
nice to have. But it's an add-on, not needed by the vast majority of
RTL documents produced in the world. Yes, believe it or not, Arabs and
other RTL users actually don't need English, any more than we English
speakers need Arabic. To this day, scholarly writings about Arabic in
English use transliteration. Arabic is quite capable of the same, even
for acronyms like IBM or CIA.
It boils down to an economic argument. For Arabic, we need a) RTL
layout (a purely graphical matter); and b) shaping. Both of these are
(relatively) inexpensive to implement. Support for bidi reordering is a
nice enhancement, but it's a) expensive; and b) unecessary unless you
write in two or more languages in the same doc.
Ask yourself a simple question. Software like Emacs has been around for
what, 30 years? It gained support for e.g. Japanese, Korean, etc. years
ago. But the 1 billion + people in the world who need RTL support are
still waiting. Why is that? IMHO, it's at least partially because of
the perceived but false association of RTL and bidi. (I can cite
specific examples of vendors declining to support Arabic solely because
of the expense of implementing bidi support.) The bidi algorithm is
complex and generally yucky. Thought experiment: imagine a world in
which nobody would implement English language software unless it had
bidi support.
Sincerely,
-gregg
_______________________________________________
emacs-bidi mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/emacs-bidi