Here's an outline for how we could deal with overlapping ranges, using
character attributes.
First, we must understand what exactly the problem is: it is currently
perfectly possible (and should remain so) to have in the GUI markup
which would looks like this:
<b>abc <i>def</b> ghi</i>
However, this is illegal XML (and latex?): a second range is not allowed
to start after a first range has started, and extend beyond the end of
the first range.
Note, however, that this is not necessarily semantically wrong (as TEI
recognizes, see http://www.tei-c.org/P4X/NH.html). Also note that LyX
currently allows this behavior in the GUI.
So basically, what we need to do is to take overlapping ranges and force
them into legal XML/latex, without changing the semantics.
The trivial way to do this is just to say: well, whichever range starts
first I consider to be the outer range; and when I reach it's end, I
must also close all inner ranges, as well. If some of those inner ranges
are still open, then I will open them again after closing the outer
range, and now they are outside, so everything is OK.
However, there are two problems with this naive approach:
1) What if we don't know which starts first? For example, if at a given
position in the GUI we toggled both <b> and <i> on, then which is the
"first" and which is "second"? Same question if I just closed an outer
range, and now have to reopen multiple inner ranges which were inside it
--- which of those inner ranges is first?
2) Are we sure that this preserves the semantic meaning of the original
text?
Regarding (2), the only example that I can think of is Bidi text, if
language is treated as a character attribute (which it currently is, and
rightly so, IMO). In this case, it matters very much whether I render:
text: abc def ghi
RTL: *******
emph: ***
as
<RTL><emph>abc</emph> def</RTL> ghi
or as
<emph><RTL>abc</RTL></emph><RTL> def</RTL> ghi
Since the first would appear visually (CAPS == RTL, underline == emph) as:
FED CBA ghi
___
and the second would appear as:
CBA fed ghi
___
(the first is correct, in this case).
So yes, it does sometimes matter, though I can't think of any non-bidi
cases where this is so.
So here's my suggested solution, which would solve both issues (1) and (2):
We define an "attribute precedence" order. Then, we use the following
rules (to be applied when moving from the GUI to non-overlapping markup)
to make sure that at any given position, the highest-precedence active
range is also the outermost one:
I) When opening two different ranges at the same position, we first open
the one with higher precedence, and only then do we open the one with
lower precedence;
II) When closing two different ranges at the same position, we first
close the one with lower precedence
III) When opening a range while other ranges are active, we first close
all active lower-precedence ranges (according to rule II); then open the
current range, and then reopen all the ranges which were closed
(according to rule I)
IV) When closing a range while other ranges are active, we first close
all active lower-precedence ranges (according to rule II); then close
the current range, and then reopen all ranges which were closed
(according to rule I).
Right now, Bidi gets highest precedence, other than that it doesn't
really matter, but we should just define an order in order to make the
behavior deterministic.
These rules will give the correct behavior, if such an "attribute
precedence" really does make sense (which I'm not sure about, but it
does if the only problem is with Bidi). And now that we have an order,
(1) is no longer a problem.
Dov