Here's an outline for how we could deal with overlapping ranges, using character attributes.

First, we must understand what exactly the problem is: it is currently perfectly possible (and should remain so) to have in the GUI markup which would looks like this:

<b>abc <i>def</b> ghi</i>

However, this is illegal XML (and latex?): a second range is not allowed to start after a first range has started, and extend beyond the end of the first range.

Note, however, that this is not necessarily semantically wrong (as TEI recognizes, see http://www.tei-c.org/P4X/NH.html). Also note that LyX currently allows this behavior in the GUI.

So basically, what we need to do is to take overlapping ranges and force them into legal XML/latex, without changing the semantics.

The trivial way to do this is just to say: well, whichever range starts first I consider to be the outer range; and when I reach it's end, I must also close all inner ranges, as well. If some of those inner ranges are still open, then I will open them again after closing the outer range, and now they are outside, so everything is OK.

However, there are two problems with this naive approach:

1) What if we don't know which starts first? For example, if at a given position in the GUI we toggled both <b> and <i> on, then which is the "first" and which is "second"? Same question if I just closed an outer range, and now have to reopen multiple inner ranges which were inside it --- which of those inner ranges is first?

2) Are we sure that this preserves the semantic meaning of the original text?

Regarding (2), the only example that I can think of is Bidi text, if language is treated as a character attribute (which it currently is, and rightly so, IMO). In this case, it matters very much whether I render:

text:  abc def ghi
RTL:   *******
emph:  ***

as

<RTL><emph>abc</emph> def</RTL> ghi

or as

<emph><RTL>abc</RTL></emph><RTL> def</RTL> ghi

Since the first would appear visually (CAPS == RTL, underline == emph) as:

FED CBA ghi
    ___

and the second would appear as:

CBA fed ghi
___

(the first is correct, in this case).

So yes, it does sometimes matter, though I can't think of any non-bidi cases where this is so.

So here's my suggested solution, which would solve both issues (1) and (2):

We define an "attribute precedence" order. Then, we use the following rules (to be applied when moving from the GUI to non-overlapping markup) to make sure that at any given position, the highest-precedence active range is also the outermost one: I) When opening two different ranges at the same position, we first open the one with higher precedence, and only then do we open the one with lower precedence; II) When closing two different ranges at the same position, we first close the one with lower precedence III) When opening a range while other ranges are active, we first close all active lower-precedence ranges (according to rule II); then open the current range, and then reopen all the ranges which were closed (according to rule I) IV) When closing a range while other ranges are active, we first close all active lower-precedence ranges (according to rule II); then close the current range, and then reopen all ranges which were closed (according to rule I).

Right now, Bidi gets highest precedence, other than that it doesn't really matter, but we should just define an order in order to make the behavior deterministic.

These rules will give the correct behavior, if such an "attribute precedence" really does make sense (which I'm not sure about, but it does if the only problem is with Bidi). And now that we have an order, (1) is no longer a problem.

Dov

Reply via email to