While working on some inline layout stuff, I've run into
nsJISx4051LineBreaker (which we use for all line breaking, actually).

Apparently it's intended to work like this: if a word (delimited by
whitespace) contains at least one CJK character, then we apply JISX4051
rules to break within that word, otherwise we just use the whole
whitespace-delimited word. JISX4051 changes behaviour even for non-CJK
text; for example, it allows breaking after commas in Latin text. So given
aaaaaa,bbbbbbbbbbbbb,ccccccccccccccccccc,dddddddddddddddd,<CJK>
we'll allow breaking after all those commas.

This is nasty and actually has many bugs on trunk. For example, it means
that removing the CJK text from the end of the run --- which could be
several lines after the start of the run --- requires us to reflow the
entire run. This messes with incremental line reflow because normally
content much later in a paragraph can't affect the layout of previous
lines. Unfortunately there is no way around this in general; in
particular, Thai word breaking apparently requires dictionary-based
analysis of the entire paragraph, so a really good algorithm will adjust
breaks globally based on the contents of multiple lines.

Some of the trunk bugs are due to this multiline reflow issue. Other
trunk bugs are due to the fact that the linebreaker's CanBreakBetween
scans in both directions looking for CJK characters to trigger CJK
rules, but Next only scans forwards and Prev only scans backwards. So
CJK rules may or may not be triggered for a given chunk of text
depending on which API call you use.

I'm tempted to use the simplifying assumption that breaking between two
non-CJK chars should use non-CJK rules, but I'm not sure of all the
consequences of that. It basically means we'll not break in places where
maybe we should. For example given
<CJK><CJK>,300<CJK><CJK>
we really want to be able to break after the comma. Worse, sequences of
(<CJK>)(<CJK>)(<CJK>)(<CJK>)(<CJK>)(<CJK>)(<CJK>)
won't break anywhere. If even if that ssumption is tenable, then because
of the Thai issue, we're eventually have to do some nasty stuff if not now.

For now I'm going to go with it, because it won't cause breaks in bad
places and the alternative is to do a lot of work that I really hadn't
planned on (fixing linebreaker, fixing block reflow, and adding whatever
optimizations are necessary to make things not suck), but I'd appreciate
feedback on this.

Rob
_______________________________________________
dev-tech-layout mailing list
[email protected]
https://lists.mozilla.org/listinfo/dev-tech-layout

Reply via email to