Dov Feldstern wrote:
Hi!

I think that I've finally tracked down the cause for a problem we've been having for a long time with RTL / Hebrew in LyX 1.5. Specifically, this is the problem described in bug 3040 (see links below), where in the frontend, Hebrew words are placed in the correct position on the screen, but the letters within each word are reversed. There were also some weird interactions between this and the locale settings (for certain *illegal* locale settings, the problems suddenly disappeared); conversely, a change made in r17354-17355 made the problem reappear, even for the illegal locale settings. (Incidentally, also prior to r15893, the bug existed always --- regardless of the locale --- and I was never able to understand how the changes made there had any effect on the Bidi code.) I think the following will explain all these phenomena:

(1) When painting characters to the screen, we try (for efficiency?) to group characters together as much as possible (http://www.lyx.org/trac/browser/lyx-devel/trunk/src/rowpainter.C?rev=17362#L325), and then paint them to the screen as a single string, rather than painting each character separately. Determining when we have to stop the grouping is done in the section of code pointed to above. (I can't necessarily explain *why* that's how the groups are broken up, or indeed if it is even the correct way to do it, but that's how it's done.) One of the conditions for breaking the group is (line 355:) if (!isPrintableNonspace(c)) --- in other words, if a character is not printable, or is a space, we break the group. Let's keep that in mind for now.

(2) LyX uses a built-in Bidi algorithm to determine the correct order for displaying characters on the screen. Internally, the text is stored in logical order. When outputting the text to the screen, the Bidi algorithm is used to determine the "visual" order of the characters. This is performed in Bidi.C (http://www.lyx.org/trac/browser/lyx-devel/trunk/src/Bidi.C). computeTables is used to create the correct mapping between the logical order and the visual order; and vis2log (log2vis) is used to return the correct logical (visual) position for the given visual (logical) position. (As a user of many software applications over the years which have had to deal with mixed Hebrew/English text, I must say that LyX has done a wonderful job. I don't think that there's any other piece of software --- commercial or otherwise --- with which I have had as few problems with respect to Bidi, as with LyX. The credit for this goes to Dekel Tsur, who implemented LyX's Bidi algorithm. Thanks, Dekel!)

(3) Qt 4 applies it's own Bidi algorithm to QStrings painted with drawText. So if a string which contains an entire word in Hebrew is painted, the letters will be reversed (the QString is assumed to be in logical order).

(4) Put (2) and (3) together, and words get reversed twice, which means they are back in logical order when displayed on the screen. *This is the basic problem that we currently have*. It's new to 1.5, I guess, for one or more of the following reasons: * Earlier versions of Qt don't apply the Bidi algorithm to painted strings? * Qt (of earlier versions, and/or Qt4) doesn't apply the Bidi algorithm to non-Unicode strings?

(5) So what happened between r15893 and r17354, and what does this have to do with the locale settings? Well, going back to (1): prior to r15893, isPrintableNonspace(char c) was implemented like this: return (c & 127) > ' '; Hebrew characters would be identified (correctly) as isPrintableNonspace, and would therefore be grouped together --- meaning that, as explained above, the string would be reversed. A space would (also correctly) be identified as such, and would therefore break the group --- that's why the order of the words was still okay. But the above method for determining isPrintableNonspace is incorrect for Unicode, and so in r15893 this was changed to use the iswprint() and iswspace() functions from wctype.h. These depend on the locale settings (specifically, LC_CTYPE) to perform correctly. So when the locale was set, the same things as explained before would happen, and the letters in each word would still get reversed. However, if the locale wasn't set, or was illegal, then Hebrew characters would not be identified as printable; thus, every Hebrew character would break the grouping; and each character would be painted to the screen separately. When this happens, (3) is irrelevant (there's only one character, nothing for Qt to reverse!), and therefore only LyX's Bidi algorithm is working, and the output is correct! In r17354/5, the isw...() functions were replaced by a different method for determining these classes, which do not depend on the locale settings. Thus, we're back to the original situation: Hebrew characters are *correctly* identified as PrintableNonspaces, and therefore grouped together while painting, and getting reversed by Qt.

So that explains the bug. Now, to the possible solutions:

(1) Paint Hebrew/Arabic characters one at a time, so that Qt's Bidi algorithm doesn't get applied. This is the easiest solution, and also the most conservative, and therefore least likely to introduce new bugs. I think that this is definitely the way to go at least until 1.5.0 is released. This does have the disadvantage, however, of painting the characters one at a time, which may be less efficient (does anyone know if this really makes a significant difference?).
Here is a way in which you can test this:
1. Make a big document with, say, 50 pages of Hebrew/Arabic text only.
2. Test how much time is needed to scroll through it while holding
   down the down-arrow key. Compare with an equally long document
   full of Roman text only. This test will paint every word on screen.
   Perhaps a test with page-down brings out more differences, the
   down-arrow test might end up testing video scrolling speed instead.
   Make sure all tests are done with the same window size. Maximizing
  is one way.


(3) The reverse of (2): stop using our own Bidi algorithm altogether, and only rely on Qt's. Abdel, I know that you're in favor of this suggestion ;). I also see that it could have certain advantages: we wouldn't have to maintain our own bidi code; we may be able to paint much larger chunks of text at once --- we
Yes - having LyX doing less work is definitely the way to go - at least in the
long run. If we ever go for another frontend, then that frontend
had better support bidi too. (Or whoever push that frontend can make
frontend-specific bidi support for it.)

Helge Hafting

Reply via email to