Line breaking =============
I'd like to implement the UAX #14 algorithm for line breaking. See http://www.unicode.org/reports/tr14/ . The algorithm is relatively simple since it's pair-based. This would be vastly more advanced than what we now have, which breaks only on spaces for western text. We'd probably also want to make nsTextTransformer transform away a few more characters than it does now, and also handle ­. (Well, there was an attempt to implement correctly, but it was really broken -- see bug 187899 comment 7.)
Two points:
1. In discussions on line breaking on www-style, Jukka Korpela brought
up some criticism [1] on the UAX 14 algorithm. For example, breaks
are disallowed before slashes '/' even with intervening spaces,
resulting in some weird line breaks [2].So the UAX 14 algorithm should be taken with some reservation.
2. If we're breaking in places that aren't spaces, we need to
prioritize break points. It doesn't need to be complex, but it
needs to be there, or we'll be breaking things like "s/he" and
the "-a" Jukka mentions in [2]. A simple prioritizing algorithm like the one outlined in [3]
would suffice. (Though in the context of Mozilla, it may not be
quite so simple. ;)[1] http://www.cs.tut.fi/~jkorpela/unicode/linebr.html [2] http://lists.w3.org/Archives/Public/www-style/2003May/0014.html (Fifth reply section, with /usr/spool example.) [3] http://lists.w3.org/Archives/Public/www-style/2003May/0010.html ("As a simplistic example...")
~fantasai
