Re: [Which list?] Unicode, bidi, terminals and tables

Beni Cherniavsky Mon, 03 Mar 2003 08:46:59 -0800

On 2003-03-03, Nadav Har'El wrote:

> On Mon, Mar 03, 2003, Beni Cherniavsky wrote about "[Which list?] Unicode, bidi, 
> terminals and tables":
> > I'm looking for the Right Model of terminals that would support bidi
> > well, with minimal hassles to the application.  In particular, it's
> > imperative that `cat foo.txt' does the right thing.  I don't care much
>
> Are you sure you know what the "right thing" for cat foo.txt to do is?
> It's more complicated than it sounds, especially when you're talking about
> mixed-language text (e.g., Hebrew and English). Do you want the "main
> direction" to be always LTR? be figured out per-line (as, for example,
> QT seems to do)? Per paragraph?
>
One thing should be choosen and declared the "right thing".  The
mininum requirement is to support plain unicode consting of paragraphs
of text, with LRM/RLM and embedding escapes.


> My "bidiv" program attempts to be such a "cat"-like program which works
> on normal non-bidi terminals (i.e., it does the bidi itself). Its default
> heuristic is to decide the direction per-paragraph, but I'm very skeptical
> that a terminal emulator can use this kind of heuristics without annoying
> a lot of people.
>
Can you do any better?  If the application is smart enough, it can
emit the unicode line/paragraph separators to override the guess.

Per-line, like in mlterm, seems more convenient to me in command-line
contexts, and when editing mixed text.

> > for the existing vt100 model; I'd actually like to take the
> > opportunity and throw most of its complexity out of the window but
> > that's not a requirement, the bidi is.
>
> Most of the vt100 "complexity" is used by cursor-addressed programs, like
> for example vi and mutt. Some redundant looking escape sequences ("e.g.,
> delete from the cursor to the end of line") are used to lower the number of
> characters that need to be output to modify a screen (libraries like "curses"
> specialize in using this efficiently), and was very important in the
> days of slow modems (remember 110 baud? :)) - and to some degree it's still
> important. I don't know why you'd want to get rid of these features,
> especially as they already exist (e.g., in "xterm").
>
I surely don't want to eliminate full-screen programs (plan9 does but
I disagree with them that graphics should be used for that; this
raises the bar for such applications unnecesarily).  In fact I want a
richer terminal than today, I'd rather see more terminal and less GUI
apps.

Instead of the application moving the cursor and changing text on the
screen, it would move the cursor and change text in a logical text
buffer and that would be reflected on the screen.  It's simply a way
to firmulate the effects of escape sequences, that ensures that bidi
support will stay well-defined, consistent and predicatable with
full-screen applications.  Actually, I believe any terminals doing
bidi today work in this way, it's the only way to keep you sanity :-).
They just happen to use vt100 as the logical model and I think that
could be improved upon.  I think the ECMA bidi terminal standard
borders on the physical model but I'm not sure I understood it
right (not that I ever heard of somebody who is :-).

I'd OK with necessary bandwith optimizations.  My point is that a lot
of this complexity was also due to backward compatibility.  Many
things about terminals could be re-thought, if already I'm going to
create something incompatible with vt100.  [This is a tempting area
and I don't want it to prevent me from achieving the bidi...]

* Input encodings - let UTF-8 be the One Encoding.

* Keyboard input:

  - The raw/cooked modes are not very elegant; cooked mode actually
    belongs in user-space.  9term had an elegant idea of user-trigered
    cooked mode - you press Esc, then you can type and edit multi-line
    text locally, when you press Esc again, it's sent.

  - Bucky bits: there are many character combinations I can press on
    my keyboard and the mapping onto ASCII + function keys out of a
    predefined set is not satisfactory.  I'd rather have prefixes sent
    for each bucky bit, like ESC now does for meta.  I'd rather have
    an open set of function keys (e.g. names are sent), rather than
    the table in curses.h, with shift variants for some of them.

* Long lines.  Example: readline w.r.t. line wrapping.  If readline is
  mistaken about the true terminal width, it becomes almost completely
  unusable.  I'd want the terminal to support longer-than-the-screen
  logical lines, so applications that only a edit a single line don't
  need to concern themeselves with multiple lines at all.

  - This can be implemented even in the matrix model with one bit for
    every physical line marking whether it continues on the next
    physical; xterm has it anyway for correct mouse selection; I even
    think that some terminals get it completely right already (but I
    don't think vt100 is one of them).

* Attributes.  I don't need A_PROTECTED nor A_INVISIBLE but I'd
  welcome true color, several fonts (italic, etc.), etc.  The limit
  needs to be put somewhere; I don't want a full Gecko (or something
  like XMLterm), unless it comes for free.  I probably want an
  extensible system, capable of describing different hardwares.

  - I also want some video card that would finally do unicode,
    antialising and true color in text mode.  I guess I will wait long
    since everybody is busy crunching 3D poligons instead :-(.

* Embedding: here is a good indicator for complexity.  You say there
  is no reason to discard complexity that's already implemented in
  xterm.  To this I ask: where else is it implemented?  Only in
  various terminal emulators (which have to do it by definition)
  GNU screen and emacs' M-x term (and now Akka).

  I want to have the terminal equivallent of XEmbed.  An easy way to
  do so should have a great impact: one could start to construct
  terminal applications asmost as easily as unix pipes.  Think
  pine/slrn/whatever embedding an external editor without it taking
  over the whole screen and without the need to exit it and then
  decide whether to send.  Think user-mode replacement for the dumb
  cooked mode editing.  Think [multiple] levels of filtering, for e.g.
  syntax highlight or anything else you can imagine.

  I'd really like to play with such architectures but currently it's
  impractical - your only way is to implement a full terminal emulator
  and run the embeded app under it, which is where the complexity
  becomes prohibitive.  There are two approaches to simplify it:

  - Have the set of escape sequences and the logical model
    sufficiently simple that implemnting it would be easy.

  - Think out the set of escape sequences so that the embedding
    application could simply pass-through the stream from the embedded
    application to the terminal, with only superficial adjustments
    (like offset all absolute coordinates).

* Termcap/info - I'm not sure this is needed; if a single set of
  escapes defined once (or with revisions) will do, that's prefered
  (but I do see the future compatibility value of such a thing).

* There were more but this is enough for now :-)

> > The model I'm thinking of is based not on a visual-ordered matrix of
> > characters but rather on a logical stream of text.  The terminal
>
> Yes, what you describe is a teletype (see
> http://www.columbia.edu/acis/history/teletype.html) - printers used
> before the advent of CRT terminals, which did not have cursor
> addressing. Do you think anybody wants to return to this? Feel like
> going back to "ed" for editing? :) I certainly don't.
>
> That being said, if you believe that cursor-addressing applications are
> "passe", and people only use graphical applications (openoffice, emacs,
> mozilla, etc.) nowadays, and the xterm window is only used for commands
> like "cd", "ls", "cat", your idea might have merit. But with you're
> permission, I'll disagree.
>
On the contrary, I disagree with what you think I meant, too :-).  See
above.

> > * They assume fixed-width display.  This requires fixing to use
> >   wcwidth() and still is fragile.  Perhaps it should be replaced by
> >   some convention of signalling the terminal "align this under this"
> >   that does not require the application to count anything at all.
>
> Remember tab stops? This is what they were for :)
>
Yes, I thought of them too.  I didn't check whether e.g. mlterm treats
them as paragraph separators (as far as bidi is concerned).  I think I
could live with simple non-nestable tables, that could be defined in
terms of tabs (at the cost of some incompatibility with accepted tab
meaning).

> >   every cell to emit the correct marks.  It might be acceptable to
> >   expect `sdiff' to output some markers but it's unreasonable to
> >   expect it to check the first strong char in every line!  It'd be
>
> Maybe the conclusion is that an application which wants to diff
> multi-lingual multi-directional text needs to be a little smarter then
> sdiff, and it's not an issue of terminal emulator at all?
> And what about diffing texts which contains left-right English, right-left
> Hebrew, and top-down Japanese? It seems to me that some problems are not going
> to be solved in the terminal-emulator level.
>
I want to work with unix pipes.  I want the program to process text,
as simple as possible.  It should do the minimum necessary for
handling unicode sensibly but not more.  I need some semantics of the
stream of data that passes between applications.  I could even live
with an extra application to render the final table on the screen but
I want to define the simple output that sdiff emits.

> > So there is a need for extending "plain-text" unicode with some extra
> > semantics that will allow me to express a table (at least sa far as
> > bidi should work).  I found some old discussions on the unicode
>
> Plain text should not have tables.
>
Probably "plain text" was a bad term.  Between the unicode notion of
plain text, consisting of paragraphs of textual info and full-blown
full-screen cursor addressing applications, I want some level of
"terminal text" that allows me to send a table to the terminal so
that it displays right.  This "terminal text" is what I can `cat' and
what the utilities that I combine in nice unix pipes will process.  So
it better be simple to process.

> Higher-level text files, like HTML, TeX or Troff sources, XML OpenOffice
> files, etc., do contain tables and have their own mechanisms for specifying
> the language in each column and so on.
>
.. and then you run lynx or something to render these tables onto
your terminal.  How do you code lynx?  Full-screen progams suffer from
the same bidi issues too, in fact they have even harder time!

Since I want to make creating terminal applications easier, I want to
minimize the entry cost of working with bidi properly.  So expecting
full-screen programs to disable implicit bidi and do it all by
themeselves is out of the question.

The best current model is that the application edits a rectangular
matrix of chars (thinking that's a vt100) that is rendered through
line-by-line implicit bidi.  This breaks as soon as the application
puts texts side-by-side that are not logical continuations.  I don't
think dialogs will survive it if they contain mixed hebrew/english
text.  The worst scenario is a pop-up window covering some of the
mixid text below it.  If at least the content of the pop-up window
will be in place, that would already be a miracle -:).

That's why I'm unsatisfied with this model.  The terminal has no idea
about the nesting of 2d areas of the application, so it can't help it
with the bidi.

> > mailing list that lead nowhere.  I don't aim at converting the whole
> > world t ouse these extensions; I just want something that can be used
> > at the command line in interactions between different unix utils (just
> > like there is no end-of-line concensus but unix works happily with
> > \n).
>
> If you produce something that has real-world usefulness, I'll use it :)
>
> Note that what you described does not need to be a terminal-emulator at
> all! It can, and perhaps even should, be simply a pseudo-terminal layer
> which processes the text and passes it to a normal, bidi-agnostic,
> terminal emulator. I believe I once saw the Arabeyes project have exactly
> such a system (I don't remember its name).
>
True [replied separately].

> > I want to be able to `cat table.txt'; having to use visual apps for
> > that is a no-no.
>
> But why is "bidiv table.txt" a no-no?
> If you want, you can even rename bidiv "cat" :)
>
It's not a no-no.  In the end I want some semantics for text that
blends well with unicode and bidi.  Whether they are realised by the
terminal itself or with the help of an additional program is less
important.  When the semantic is agreed upon, bidiv could be expected
as part of the "terminal" as far as the user is concerned (whether
it's integrated into its code or stays a separate process).

> > models.  I'm not sure yet what's the right direction.  Comments,
> > ideas, request to elaborate or just redirections to the appropriate
> > mailing list :-) are welcome.
>
> The appropriate mailing list is, in my opinion ivrix-discuss.
>
Where are the 2003 archives ?_?  Is it alive?

-- 
Beni Cherniavsky <[EMAIL PROTECTED]>

pure virtual static warp shield (TNG++, All Good Things O-=)

=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Re: [Which list?] Unicode, bidi, terminals and tables

Reply via email to