[ Long mail ] [I'm not sure what's the most appropriate list to discuss this; internet search shows many lists where these issues are discussed]
I've been thinking long on the issues of proper bidi in terminals. I've seen a formulation on some list [mlterm's?] by somebody that the terminal's purpose is to simplify writing applications [dealing with text], and I heaqrtily agree - it's vastly easier to use printf than draw strings in X :-) I'm looking for the Right Model of terminals that would support bidi well, with minimal hassles to the application. In particular, it's imperative that `cat foo.txt' does the right thing. I don't care much for the existing vt100 model; I'd actually like to take the opportunity and throw most of its complexity out of the window but that's not a requirement, the bidi is. The model I'm thinking of is based not on a visual-ordered matrix of characters but rather on a logical stream of text. The terminal renders the text at any point in time (through bidi, shaping, etc.); non-dumb applications can change this text in the terminal (through some escape sequences, probably). This mail comes close: http://mail.nl.linux.org/linux-utf8/2002-10/msg00006.html This model stems much more naturally from the requirement that `cat foo.txt' does the right thing. So this moves the discussion to another level: "plain text" unicode is not yet adequate to represent many kinds of text that were easily handled by fixed-width ascii. These are most notably mathmatical formulae (with their nestable horizontal and vertical layouts and big characters) and all kinds of tables (where texts are shown side-to-side). Leaving math alone (it's harder), here is the problem with tables: Many simple unix utils display texts side-by-side: paste, sdiff, ls -l, etc. * They assume fixed-width display. This requires fixing to use wcwidth() and still is fragile. Perhaps it should be replaced by some convention of signalling the terminal "align this under this" that does not require the application to count anything at all. * As-is, they interact horribly with unicode's bidi algorithm. Unless something signals that every collumn is a separate "paragraph", columns would get reordered and the reader will be confused. Also, it's usually useful to auto-detect the global direction of each line by the first strongly-directional character; this would mirror some lines of the table completely. The best example is probably `sdiff' - I think that even the <|> indicators would be mirrored, so that in English lines the first file is on the left and on Hebrew lines it's on the right of the diff, without any more hints. * Unicode embedding marks could help here (by embedding every collumn), except that they require you to know the directionality of every cell to emit the correct marks. It might be acceptable to expect `sdiff' to output some markers but it's unreasonable to expect it to check the first strong char in every line! It'd be most useful to have every cell implicitly directioned. I consider this a shortcoming of the unicode embedding model. Instead they should have specified only a neutral embedding mark, and when you know better, you put LRM/RLM as the first char of the embedded text. This would allow applications to blindly embed text, while allowing the text itself to signal its directionality... So there is a need for extending "plain-text" unicode with some extra semantics that will allow me to express a table (at least sa far as bidi should work). I found some old discussions on the unicode mailing list that lead nowhere. I don't aim at converting the whole world t ouse these extensions; I just want something that can be used at the command line in interactions between different unix utils (just like there is no end-of-line concensus but unix works happily with \n). I want to be able to `cat table.txt'; having to use visual apps for that is a no-no. In other words: the differences between a full-screen and a dumb application are only that the former can go back and modify its output (and that it's aware how much of its output fits the screen), it must not be a difference in the convenience of presenting the data. All this gives me different ideas, most leading to some nested box models. I'm not sure yet what's the right direction. Comments, ideas, request to elaborate or just redirections to the appropriate mailing list :-) are welcome. -- Beni Cherniavsky <[EMAIL PROTECTED]> pure virtual static warp shield (TNG++, All Good Things O-=) ================================================================= To unsubscribe, send mail to [EMAIL PROTECTED] with the word "unsubscribe" in the message body, e.g., run the command echo unsubscribe | mail [EMAIL PROTECTED]
