While working on some inline layout stuff, I've run into nsJISx4051LineBreaker (which we use for all line breaking, actually).
Apparently it's intended to work like this: if a word (delimited by whitespace) contains at least one CJK character, then we apply JISX4051 rules to break within that word, otherwise we just use the whole whitespace-delimited word. JISX4051 changes behaviour even for non-CJK text; for example, it allows breaking after commas in Latin text. So given aaaaaa,bbbbbbbbbbbbb,ccccccccccccccccccc,dddddddddddddddd,<CJK> we'll allow breaking after all those commas. This is nasty and actually has many bugs on trunk. For example, it means that removing the CJK text from the end of the run --- which could be several lines after the start of the run --- requires us to reflow the entire run. This messes with incremental line reflow because normally content much later in a paragraph can't affect the layout of previous lines. Unfortunately there is no way around this in general; in particular, Thai word breaking apparently requires dictionary-based analysis of the entire paragraph, so a really good algorithm will adjust breaks globally based on the contents of multiple lines. Some of the trunk bugs are due to this multiline reflow issue. Other trunk bugs are due to the fact that the linebreaker's CanBreakBetween scans in both directions looking for CJK characters to trigger CJK rules, but Next only scans forwards and Prev only scans backwards. So CJK rules may or may not be triggered for a given chunk of text depending on which API call you use. I'm tempted to use the simplifying assumption that breaking between two non-CJK chars should use non-CJK rules, but I'm not sure of all the consequences of that. It basically means we'll not break in places where maybe we should. For example given <CJK><CJK>,300<CJK><CJK> we really want to be able to break after the comma. Worse, sequences of (<CJK>)(<CJK>)(<CJK>)(<CJK>)(<CJK>)(<CJK>)(<CJK>) won't break anywhere. If even if that ssumption is tenable, then because of the Thai issue, we're eventually have to do some nasty stuff if not now. For now I'm going to go with it, because it won't cause breaks in bad places and the alternative is to do a lot of work that I really hadn't planned on (fixing linebreaker, fixing block reflow, and adding whatever optimizations are necessary to make things not suck), but I'd appreciate feedback on this. Rob _______________________________________________ dev-tech-layout mailing list [email protected] https://lists.mozilla.org/listinfo/dev-tech-layout

