Hi!

Here I will try to explain my understanding of the issues leading to bugs 1820 and 3613. This is partly because I feel that it's important that as many people as possible be aware of these, for future development; partly in order to explain the patches I'm proposing for these two bugs, so that together we can try and find any problems with them which still need to be fixed; and partly to help myself identify what exactly the changes being made are, so that we can create the appropriate lyx2lyx change for 1820, if necessary. Also, my understanding is still far from complete, so please correct me if you think I made any mistakes.

We're dealing with language/font switches within a document, and specifically around insets (I'm talking about Footnotes, Notes --- I'm not sure about other types of insets).

As far as the UI is concerned, this is how that works: if the user is typing in language A, and then opens a footnote, then the language remains language A, also inside the footnote. When leaving the footnote, the language is still language A. In short, the language remains unchanged until explicitly changed by the user. Font changes (emphasis, bold, etc.) are slightly different. They remain in effect until the end of a paragraph, and then revert back to normal. If the user opens a footnote while one of these fonts are in effect, that font remains in effect inside the footnote, until the end of the first paragraph in the inset. The next paragraph already reverts back to normal. When leaving the footnote, the same font that was in effect when entering it is still in effect. So far this was the UI.

As far as the .lyx file is concerned: languages remain in effect until explicitly switched; the exception, though, is insets: inside an inset, the language is assumed to revert to the document language, unless explicitly told otherwise. This means, for example, the following: Document language is A, user is currently typing in language B, and opens an inset. Inside the inset, the user continues typing in language B. In the .lyx file, we'll see an explicit \lang B command inside the inset, even though the user didn't actually have to enter any langauge command. Conversely, let's say that inside the inset the user starts typing in language A again. In order to do that, he had to enter an explicit language switch. However, in the .lyx file, there will be no \lang command! it'll just look like this:

\lang B
outside the inset
\begin_inset Foot
\begin_layout Standard
inside inset, language is now A

So this is one confusing aspect, but it works perfectly fine once you know that that's what's happening. Regarding font changes, they also revert inside the inset. So this was the .lyx perspective.

Finally, we'll look at the latex point of view: from latex's point of view, a language stays in effect until it is explicitly closed. So the language that is in effect upon entering a footnote is the language inside the footnote, until explicitly switched. Fonts are also actually like that, I think, except that you are required to close the font commands at the end of a paragraph. In terms of the actual language switch commands, babel provides two sets of commands for language switches: \selectlanguage, which is meant for switching entire paragraphs, and also affects the date format, the direction in Bidi text, etc.; and \foreignlanguage{}{}, which is meant for insertions within a paragraph. Note that foreignlanguage accepts the text as an argument, and hence doesn't allow multiple paragraphs; i.e., the command must be closed when a paragraph ends. It also doesn't allow an inset, because that inset may have multiple paragraphs. Generally, LyX generates latex as follows: the paragraph's language, which is determined by language at its beginning, will be set with \selectlanguage. If there are language switches in the middle of the paragraph, those will be inserted inside foreignlanguage{}{} commands. If the paragraph language is A, and then the user switches to B, and then opens an inset, and continues typing in B inside the inset, LyX will have switched to B with \foreignlanguage, will close this before the inset, open the inset, and then inside it immediately set the language back to B again, this time using a \selectlanguage command; like this:

\selectlanguage{A}beginning of paragraph\foreignlangauge{B}{this is some text in B}\footnote{\selectlanguage{B}more B}\foreignlanguage{B}{still B} back to A again.

As far as the output is concerned, it looks just as if the inset itself were included in the \foreignlanguage command, right? Well, this is where RTL gets into the picture: in Bidi text, it *does* make a difference whether the inset is embedded within the language switch or not. If the text logically looks like this (where {} is RTL text, [1] is a footnote):

a...m{A...M}[1]{N...Z}n...z

it will display visually as:

a...m{M...A}[1]{Z...N}n...z

Whereas this logical text -- the only difference this time that the footnote is inside the language switch:

a...m{A...M[1]N...Z}n...z

will display visually like this:

a...m{Z...N[1]M...A}n...z

So we see that in Bidi text it makes a difference. Therefore, for Bidi we use \L and \R (or variants, depending on the specific language) instead of the \foreignlanguage commands. And \L and \R allow nesting, and allow paragraph breaks. BTW, this difference also manifests itself in the GUI: you may have noticed that the language of the inset and the language of the text inside the inset can be set independently. This may have seemed strange, but now you see why that is: the language of the inset itself is what will determine whether the inset is embedded in the language switch or not; but independently, the language of the text can be in any other language, without affecting the flow of the text around the inset. The GUI already behaves this way. When generating latex, we will now have to be careful about the same things: sometimes we'll want to keep the language open, end nest the footnote inside the \L; so in this case there's a difference between bidi and non-bidi language switches.

Up to now I described the background.

Now, to the bugs and patches:

Bug 3613 was a result of the fact that during one of the lyx2lyx conversions to utf8, when the encodings were determined, the fact that the language is reset to the document language (from the .lyx point of view) inside insets was not taken into account. So in these situations where it made a difference, the text was getting encoded using the wrong language (encoding). The patch is conceptually fairly simple: just identify those cases, and reecndoe them into the correct encoding this time.

Bug 1820 is a whole set of problems with how the latex was being generated. Basically, if you just keep the facts explained above in mind, then making sure that UI is translated correctly to the .lyx file, and then the .lyx file is translated correctly to latex, is just a matter of "accounting". The conversion from the UI to .lyx appears to have been done correctly, but there were some issues that were getting messed up in the transition from the .lyx to latex --- mainly (but not only) with RTL issues.

So now, I think, is the time to take a look at the patch itself. Again, keeping the above facts in mind, it shouldn't be *that* complicated to understand what's going on. What's a little confusing in the implementation, is that there are all kinds of pieces of information (the current font, language, encoding) which need to be kept track of at different levels (document, paragraph, local switch within the paragraph, inset); and each piece of information is stored a little differently: some are passed around on the calling stack; some are stored semi-globally, etc. But again, keeping all of this in mind, I think it's possible to understand the patch.

Hope this helps (I think it helped me to write it out)!
Dov

P.S. So patch 1820 was sent in last night, it's still not in its final form, I have a few issues to work out. After having written this out, maybe I'll finally be able to fix those last issues. But I think that even now, it's better than the current situation; but I need a little more time. Patch 3613 was sent in earlier tonight. Please try them out, and report any problems you see with them. I'd like these to go in for 1.5.0, but I would like them to be well tested first. Again, I suggest perhaps branching 1.5.x now, so that this doesn't hold up development on the trunk. That'll give us some time to work out the final issues that still need to be worked out for the 1.5 release.

Reply via email to