[Which list?] Unicode, bidi, terminals and tables

Beni Cherniavsky Mon, 03 Mar 2003 04:08:29 -0800

[ Long mail ]
[I'm not sure what's the most appropriate list to discuss this;
internet search shows many lists where these issues are discussed]


I've been thinking long on the issues of proper bidi in terminals.
I've seen a formulation on some list [mlterm's?] by somebody that the
terminal's purpose is to simplify writing applications [dealing with
text], and I heaqrtily agree - it's vastly easier to use printf than
draw strings in X :-)

I'm looking for the Right Model of terminals that would support bidi
well, with minimal hassles to the application.  In particular, it's
imperative that `cat foo.txt' does the right thing.  I don't care much
for the existing vt100 model; I'd actually like to take the
opportunity and throw most of its complexity out of the window but
that's not a requirement, the bidi is.

The model I'm thinking of is based not on a visual-ordered matrix of
characters but rather on a logical stream of text.  The terminal
renders the text at any point in time (through bidi, shaping, etc.);
non-dumb applications can change this text in the terminal (through
some escape sequences, probably).  This mail comes close:
http://mail.nl.linux.org/linux-utf8/2002-10/msg00006.html

This model stems much more naturally from the requirement that `cat
foo.txt' does the right thing.  So this moves the discussion to
another level: "plain text" unicode is not yet adequate to represent
many kinds of text that were easily handled by fixed-width ascii.
These are most notably mathmatical formulae (with their nestable
horizontal and vertical layouts and big characters) and all kinds of
tables (where texts are shown side-to-side).  Leaving math alone (it's
harder), here is the problem with tables:

Many simple unix utils display texts side-by-side: paste, sdiff,
ls -l, etc.

* They assume fixed-width display.  This requires fixing to use
  wcwidth() and still is fragile.  Perhaps it should be replaced by
  some convention of signalling the terminal "align this under this"
  that does not require the application to count anything at all.

* As-is, they interact horribly with unicode's bidi algorithm.  Unless
  something signals that every collumn is a separate "paragraph",
  columns would get reordered and the reader will be confused.  Also,
  it's usually useful to auto-detect the global direction of each line
  by the first strongly-directional character; this would mirror some
  lines of the table completely.  The best example is probably `sdiff'
  - I think that even the <|> indicators would be mirrored, so that in
  English lines the first file is on the left and on Hebrew lines it's
  on the right of the diff, without any more hints.

* Unicode embedding marks could help here (by embedding every
  collumn), except that they require you to know the directionality of
  every cell to emit the correct marks.  It might be acceptable to
  expect `sdiff' to output some markers but it's unreasonable to
  expect it to check the first strong char in every line!  It'd be
  most useful to have every cell implicitly directioned.  I consider
  this a shortcoming of the unicode embedding model.  Instead they
  should have specified only a neutral embedding mark, and when you
  know better, you put LRM/RLM as the first char of the embedded text.
  This would allow applications to blindly embed text, while allowing
  the text itself to signal its directionality...

So there is a need for extending "plain-text" unicode with some extra
semantics that will allow me to express a table (at least sa far as
bidi should work).  I found some old discussions on the unicode
mailing list that lead nowhere.  I don't aim at converting the whole
world t ouse these extensions; I just want something that can be used
at the command line in interactions between different unix utils (just
like there is no end-of-line concensus but unix works happily with
\n).

I want to be able to `cat table.txt'; having to use visual apps for
that is a no-no.  In other words: the differences between a
full-screen and a dumb application are only that the former can go
back and modify its output (and that it's aware how much of its output
fits the screen), it must not be a difference in the convenience of
presenting the data.

All this gives me different ideas, most leading to some nested box
models.  I'm not sure yet what's the right direction.  Comments,
ideas, request to elaborate or just redirections to the appropriate
mailing list :-) are welcome.

-- 
Beni Cherniavsky <[EMAIL PROTECTED]>

pure virtual static warp shield (TNG++, All Good Things O-=)

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

[Which list?] Unicode, bidi, terminals and tables

Reply via email to