Re: Spaces on RTL boundaries

Dov Feldstern Sat, 09 Jun 2007 17:35:06 -0700

Georg Baum wrote:

Stefan Schimanski wrote:

Then the question is only from which file format revision we start
with a lyx2lyx filter: from the change of the exporting behavior or
from the fix from today and the last days...


That depends on the format when the change happened. If it was during the
1.5 development cycle I would introduce a new file format now and declare
all 1.5svn versions between the original change and this new format buggy.
We did this for example with format 261.
If it was earlier you can't simply declare stable versions buggy (and people
will have corrected the wrong spaces by hand if they noticed), so it is
better to put the conversion between the old formats. Then it will at least
help people who now convert old documents.


Georg

Hi!

Georg, there never was a format change of this issue, but their *should* have been, because although the format of the lyx file was not changed, its interpretation *was* (not by us now, but back when the original change in the latex generation occurred, see below; as Stefan pointed out, all we did now was to bring the GUI up to date with the change which already occurred then in the latex generation).

The change I'm talking about occurred somewhere between 17143 and 17158 (I think it's 17144, but there are other changes in the above range which may be somewhat related, and format change 259 is somewhere in the middle there, too...).

This is the issue: until 17143 (including), "abc[FED ] ghi" was output to latex as "abc \R{DEF} ghi"; but starting with 17144, the output for the same text (no change in the .lyx file at all!) became "abc\R{ DEF} ghi". So a format change is needed which explicitly changes the .lyx file: any occurrence of "abc[FED ] ghi" should be changed to "abc [FED] ghi", in order that the same latex output be generated. (Note that there is no need for doing anything when downgrading: both forms will be interpreted identically in the old LyX versions, so we can just as easily use the new format, in this respect. It's just that there are some things which couldn't be typeset before 17144, so the explicit differentiation wasn't needed.)

I think that in order to deal with this cleanly, what we have to do is this: we create a new file format *now* (at first I was thinking we should use version 259 to tag these differences, because it was so close to when the changes were actually made; but then I realized the following: if someone has already converted from pre-259 to post-259, then if we now make this lyx2lyx change as part of 259, then that file will never get fixed; so we need to do it now, so that already-converted files will also get fixed. I'm not so worried about post-259 files where the above construct now exists on purpose, because it's very very rare for someone to really want it; in fact, I assume that even now it's just an oversight on the part of the user). This new file format doesn't have any real format changes associated with it, except that the version number gets incremented. Besides that, what the conversion does is to find occurrences of "space" which appear between LTR and RTL sections, and makes sure that they are in the language which has the same direction as the language of the paragraph.


So:

1) Does this sound right?

2) How do I do it in lyx2lyx? I looked at normalize_font_whitespace() ---

*** I now see what the problem is! What we need is *exactly* the change of format 259; but in lyx2lyx, the \\lang property wasn't included for some reason! The attached patch fixes this, and the output is now OK (i.e., with this patch, a file created from before the change (also attached), and in which the problem occurs, is output correctly and displays correctly in the GUI even now!). But anyway, for the reasons outlined above, I still think it's more correct to do this is a separate, new file format, effective as of now. I guess that means duplicating normalize_font_whitespace(), but having it now act *only* on the lang property (Georg, you may want to make sure there aren't any other forgotten properties, once we're at it...), and applying the new function as the converter to a new file format...

So Georg, does all this make sense? And if so, could you please take care of it? I guess we also need Jose's OK for this...


Thanks!
Dov

more_heb_problems.lyx
Description: application/lyx

Index: lyx-devel/src/Bidi.cpp
===================================================================
--- lyx-devel.orig/src/Bidi.cpp 2007-06-08 08:42:47.000000000 +0200
+++ lyx-devel/src/Bidi.cpp      2007-06-08 08:43:30.000000000 +0200
@@ -95,7 +95,11 @@
        pos_type const body_pos = par.beginOfBody();
 
        for (pos_type lpos = start_; lpos <= end_; ++lpos) {
-               bool is_space = par.isLineSeparator(lpos);
+               bool is_space = false;
+               // We do not handle spaces around an RTL segment in a special 
way anymore.
+               // Neither does LaTeX, so setting is_spacs to false makes the 
view in LyX
+               // consistent with the output of LaTeX later. The old setting 
was:
+               // is_space = par.isLineSeparator(lpos);
                pos_type const pos =
                        (is_space && lpos + 1 <= end_ &&
                         !par.isLineSeparator(lpos + 1) &&
Index: lyx-devel/src/Text.cpp
===================================================================
--- lyx-devel.orig/src/Text.cpp 2007-06-08 08:42:47.000000000 +0200
+++ lyx-devel/src/Text.cpp      2007-06-08 10:07:11.000000000 +0200
@@ -732,8 +732,22 @@
                        return;
                }
                BOOST_ASSERT(cur.pos() > 0);
-               if ((par.isLineSeparator(cur.pos() - 1) || 
par.isNewline(cur.pos() - 1))
-                   && !par.isDeleted(cur.pos() - 1)) {
+               
+               // get character position visually in front of cur.pos()
+               pos_type prevpos;
+               if (cur.boundary())
+                       prevpos = cur.pos() - 1;
+               else {
+                       bool rtl = isRTL(cur.buffer(), cur.top(), 
cur.boundary());
+                       prevpos = getPosVisually(cur.bv(), cur.pit(), 
cur.pos(), false, 
+                                                                               
                                         rtl ? 1 : -1);
+               }
+                       
+               // no space if previous was space (or newline),
+               if (prevpos != cur.pos()
+                               && 0 <= prevpos && prevpos < par.size()
+                               && (par.isLineSeparator(prevpos) || 
par.isNewline(prevpos))
+                   && !par.isDeleted(prevpos)) {
                        static bool sent_space_message = false;
                        if (!sent_space_message) {
                                cur.message(_("You cannot type two spaces this 
way. "
Index: lyx-devel/src/Text2.cpp
===================================================================
--- lyx-devel.orig/src/Text2.cpp        2007-06-08 08:42:47.000000000 +0200
+++ lyx-devel/src/Text2.cpp     2007-06-08 10:07:23.000000000 +0200
@@ -1114,6 +1114,72 @@
 }
 
 
+pos_type Text::getPosVisually(BufferView const & bv, pit_type pit,
+  pos_type pos, bool boundary, int visualDist) const
+{
+       Paragraph const & par = getPar(pit);
+       ParagraphMetrics const & pm = bv.parMetrics(this, pit);
+       pos_type oldpos = pos;
+       
+       // if RTL support is not enabled
+       if (!lyxrc.rtl_support) {
+               pos += visualDist;
+               if (pos < 0)
+                       return 0;
+               if (pos > par.size())
+                       return par.size();
+               return pos;
+       }
+       
+       // initialize the starting row and bidi tables
+       Bidi bidi;
+       Row const & row = pm.getRow(pos, boundary);
+       bidi.computeTables(par, *bv.buffer(), row);
+       
+       // try to find position
+       while (true) {
+               // Convert to visual position
+               if (!bidi.inRange(pos))
+                       return oldpos;
+               pos_type vstart = bidi.log2vis(pos);
+               
+               // Go to the character visualDist away
+               //bool backward = bidi.level(corrected_pos) & 1;
+               pos_type vpos = vstart + visualDist; //backward ? (vstart - 
visualDist) : (vstart + visualDist);
+               if (!bidi.inRange(vpos)) {
+                       // end of paragraph?
+                       if (vpos == par.size())
+                               return par.size();
+                       
+                       // out of paragraph? Return oldpos, i.e. we couldn't 
find the position
+                       if (vpos < 0 || vpos > par.size())
+                               return oldpos;
+                       
+                       // we might have changed the row. Try in the next/prev 
one
+                       if (visualDist > 0) {
+                               vstart = row.endpos();
+                               boundary = false;
+                               visualDist -= row.endpos() - vstart;
+                       } else {
+                               vstart = row.pos();
+                               boundary = true;
+                               visualDist -= vstart - row.pos();
+                       }
+               } else {
+                       // We found it -> convert back to logical position
+                       pos = bidi.vis2log(vpos);
+                       break;
+               }
+               
+               // update the bidi tables for next iteration
+               Row const & row = pm.getRow(vstart, boundary);
+               bidi.computeTables(par, *bv.buffer(), row);
+       }
+       
+       return pos;
+}
+
+
 bool Text::deleteEmptyParagraphMechanism(Cursor & cur,
                Cursor & old, bool & need_anchor_change)
 {
@@ -1144,29 +1210,48 @@
 
        bool const same_inset = &old.inset() == &cur.inset();
        bool const same_par = same_inset && old.pit() == cur.pit();
-       bool const same_par_pos = same_par && old.pos() == cur.pos();
+       bool const same_par_pos = same_par && old.pos() == cur.pos() &&
+               old.boundary() == cur.boundary();
 
        // If the chars around the old cursor were spaces, delete one of them.
        if (!same_par_pos) {
-               // Only if the cursor has really moved.
-               if (old.pos() > 0
-                   && old.pos() < oldpar.size()
-                   && oldpar.isLineSeparator(old.pos())
-                   && oldpar.isLineSeparator(old.pos() - 1)
-                   && !oldpar.isDeleted(old.pos() - 1)) {
-                       oldpar.eraseChar(old.pos() - 1, 
cur.buffer().params().trackChanges);
+               // What is position of the char visually after the entered 
space? 
+               // Non-trivial on bidi boundaries:
+               // Case 1: "abc [|WERBEH] ghi" ==> "abc [| WERBEH] ghi"
+               // Case 2: "abc|[ WERBEH] ghi" ==> "abc |[ WERBEH] ghi"
+               // Case 3: "abc [WERBEH ]|ghi" ==> "abc [WERBEH ] |ghi"
+               // Case 4: "abc [WERBEH|] ghi" ==> "abc [WERBEH| ] ghi"
+               
+               // Was a space entered?
+               pos_type spacepos = old.pos() - 1;
+               if (spacepos >= 0 && spacepos < oldpar.size()
+                               && oldpar.isLineSeparator(spacepos)
+                               && !oldpar.isDeleted(spacepos)) {
+                       
+                       // Get visually following character position
+                       pos_type nextpos;
+                       bool rtl = old.text()->isRTL(old.buffer(), old.top(), 
old.boundary());
+                       nextpos = old.text()->getPosVisually(old.bv(), 
old.pit(), spacepos, 
+                                                            false, rtl ? -1 : 
1);
+                       
+                       // Is a space following?
+                       if (nextpos != spacepos 
+                                       && nextpos >= 0 && nextpos < 
oldpar.size()
+                                       && oldpar.isLineSeparator(nextpos)) {
+                               oldpar.eraseChar(spacepos, 
cur.buffer().params().trackChanges);
 #ifdef WITH_WARNINGS
 #warning This will not work anymore when we have multiple views of the same 
buffer
 // In this case, we will have to correct also the cursors held by
 // other bufferviews. It will probably be easier to do that in a more
 // automated way in CursorSlice code. (JMarc 26/09/2001)
 #endif
-                       // correct all cursor parts
-                       if (same_par) {
-                               fixCursorAfterDelete(cur.top(), old.top());
-                               need_anchor_change = true;
+                               // correct all cursor parts
+                               if (same_par) {
+                                       fixCursorAfterDelete(cur.top(), 
old.top());
+                                       need_anchor_change = true;
+                               }
+                               return true;
                        }
-                       return true;
                }
        }
 
Index: lyx-devel/src/Text.h
===================================================================
--- lyx-devel.orig/src/Text.h   2007-06-08 08:42:47.000000000 +0200
+++ lyx-devel/src/Text.h        2007-06-08 08:43:30.000000000 +0200
@@ -365,6 +365,11 @@
        /// now because recordUndo() is called which needs a Cursor.
        static bool deleteEmptyParagraphMechanism(Cursor & cur,
                Cursor & old, bool & need_anchor_change);
+       
+       /// Get logical position visualDist positions visually away from pos, 
+       /// If there is no position like that, the old value of pos is returned
+       pos_type getPosVisually(BufferView const & bv, pit_type pit,
+               pos_type pos, bool boundary, int visualDist) const;
 
        /// delete double spaces, leading spaces, and empty paragraphs
        /// from \first to \last paragraph

Re: Spaces on RTL boundaries

Reply via email to