Hi!
Here I will try to explain my understanding of the issues leading to
bugs 1820 and 3613. This is partly because I feel that it's important
that as many people as possible be aware of these, for future
development; partly in order to explain the patches I'm proposing for
these two bugs, so that together we can try and find any problems with
them which still need to be fixed; and partly to help myself identify
what exactly the changes being made are, so that we can create the
appropriate lyx2lyx change for 1820, if necessary. Also, my
understanding is still far from complete, so please correct me if you
think I made any mistakes.
We're dealing with language/font switches within a document, and
specifically around insets (I'm talking about Footnotes, Notes --- I'm
not sure about other types of insets).
As far as the UI is concerned, this is how that works: if the user is
typing in language A, and then opens a footnote, then the language
remains language A, also inside the footnote. When leaving the footnote,
the language is still language A. In short, the language remains
unchanged until explicitly changed by the user. Font changes (emphasis,
bold, etc.) are slightly different. They remain in effect until the end
of a paragraph, and then revert back to normal. If the user opens a
footnote while one of these fonts are in effect, that font remains in
effect inside the footnote, until the end of the first paragraph in the
inset. The next paragraph already reverts back to normal. When leaving
the footnote, the same font that was in effect when entering it is still
in effect. So far this was the UI.
As far as the .lyx file is concerned: languages remain in effect until
explicitly switched; the exception, though, is insets: inside an inset,
the language is assumed to revert to the document language, unless
explicitly told otherwise. This means, for example, the following:
Document language is A, user is currently typing in language B, and
opens an inset. Inside the inset, the user continues typing in language
B. In the .lyx file, we'll see an explicit \lang B command inside the
inset, even though the user didn't actually have to enter any langauge
command. Conversely, let's say that inside the inset the user starts
typing in language A again. In order to do that, he had to enter an
explicit language switch. However, in the .lyx file, there will be no
\lang command! it'll just look like this:
\lang B
outside the inset
\begin_inset Foot
\begin_layout Standard
inside inset, language is now A
So this is one confusing aspect, but it works perfectly fine once you
know that that's what's happening. Regarding font changes, they also
revert inside the inset. So this was the .lyx perspective.
Finally, we'll look at the latex point of view: from latex's point of
view, a language stays in effect until it is explicitly closed. So the
language that is in effect upon entering a footnote is the language
inside the footnote, until explicitly switched. Fonts are also actually
like that, I think, except that you are required to close the font
commands at the end of a paragraph. In terms of the actual language
switch commands, babel provides two sets of commands for language
switches: \selectlanguage, which is meant for switching entire
paragraphs, and also affects the date format, the direction in Bidi
text, etc.; and \foreignlanguage{}{}, which is meant for insertions
within a paragraph. Note that foreignlanguage accepts the text as an
argument, and hence doesn't allow multiple paragraphs; i.e., the command
must be closed when a paragraph ends. It also doesn't allow an inset,
because that inset may have multiple paragraphs. Generally, LyX
generates latex as follows: the paragraph's language, which is
determined by language at its beginning, will be set with
\selectlanguage. If there are language switches in the middle of the
paragraph, those will be inserted inside foreignlanguage{}{} commands.
If the paragraph language is A, and then the user switches to B, and
then opens an inset, and continues typing in B inside the inset, LyX
will have switched to B with \foreignlanguage, will close this before
the inset, open the inset, and then inside it immediately set the
language back to B again, this time using a \selectlanguage command;
like this:
\selectlanguage{A}beginning of paragraph\foreignlangauge{B}{this is some
text in B}\footnote{\selectlanguage{B}more B}\foreignlanguage{B}{still
B} back to A again.
As far as the output is concerned, it looks just as if the inset itself
were included in the \foreignlanguage command, right? Well, this is
where RTL gets into the picture: in Bidi text, it *does* make a
difference whether the inset is embedded within the language switch or
not. If the text logically looks like this (where {} is RTL text, [1] is
a footnote):
a...m{A...M}[1]{N...Z}n...z
it will display visually as:
a...m{M...A}[1]{Z...N}n...z
Whereas this logical text -- the only difference this time that the
footnote is inside the language switch:
a...m{A...M[1]N...Z}n...z
will display visually like this:
a...m{Z...N[1]M...A}n...z
So we see that in Bidi text it makes a difference. Therefore, for Bidi
we use \L and \R (or variants, depending on the specific language)
instead of the \foreignlanguage commands. And \L and \R allow nesting,
and allow paragraph breaks. BTW, this difference also manifests itself
in the GUI: you may have noticed that the language of the inset and the
language of the text inside the inset can be set independently. This may
have seemed strange, but now you see why that is: the language of the
inset itself is what will determine whether the inset is embedded in the
language switch or not; but independently, the language of the text can
be in any other language, without affecting the flow of the text around
the inset. The GUI already behaves this way. When generating latex, we
will now have to be careful about the same things: sometimes we'll want
to keep the language open, end nest the footnote inside the \L; so in
this case there's a difference between bidi and non-bidi language switches.
Up to now I described the background.
Now, to the bugs and patches:
Bug 3613 was a result of the fact that during one of the lyx2lyx
conversions to utf8, when the encodings were determined, the fact that
the language is reset to the document language (from the .lyx point of
view) inside insets was not taken into account. So in these situations
where it made a difference, the text was getting encoded using the wrong
language (encoding). The patch is conceptually fairly simple: just
identify those cases, and reecndoe them into the correct encoding this time.
Bug 1820 is a whole set of problems with how the latex was being
generated. Basically, if you just keep the facts explained above in
mind, then making sure that UI is translated correctly to the .lyx file,
and then the .lyx file is translated correctly to latex, is just a
matter of "accounting". The conversion from the UI to .lyx appears to
have been done correctly, but there were some issues that were getting
messed up in the transition from the .lyx to latex --- mainly (but not
only) with RTL issues.
So now, I think, is the time to take a look at the patch itself. Again,
keeping the above facts in mind, it shouldn't be *that* complicated to
understand what's going on. What's a little confusing in the
implementation, is that there are all kinds of pieces of information
(the current font, language, encoding) which need to be kept track of at
different levels (document, paragraph, local switch within the
paragraph, inset); and each piece of information is stored a little
differently: some are passed around on the calling stack; some are
stored semi-globally, etc. But again, keeping all of this in mind, I
think it's possible to understand the patch.
Hope this helps (I think it helped me to write it out)!
Dov
P.S. So patch 1820 was sent in last night, it's still not in its final
form, I have a few issues to work out. After having written this out,
maybe I'll finally be able to fix those last issues. But I think that
even now, it's better than the current situation; but I need a little
more time.
Patch 3613 was sent in earlier tonight. Please try them out, and report
any problems you see with them. I'd like these to go in for 1.5.0, but I
would like them to be well tested first.
Again, I suggest perhaps branching 1.5.x now, so that this doesn't hold
up development on the trunk. That'll give us some time to work out the
final issues that still need to be worked out for the 1.5 release.