L. David Baron wrote:

Line breaking
=============

I'd like to implement the UAX #14 algorithm for line breaking.  See
http://www.unicode.org/reports/tr14/ .  The algorithm is relatively
simple since it's pair-based.  This would be vastly more advanced than
what we now have, which breaks only on spaces for western text.  We'd
probably also want to make nsTextTransformer transform away a few more
characters than it does now, and also handle ­.  (Well,
there was an attempt to implement   correctly, but it was really
broken -- see bug 187899 comment 7.)

Two points:


  1. In discussions on line breaking on www-style, Jukka Korpela brought
     up some criticism [1] on the UAX 14 algorithm. For example, breaks
     are disallowed before slashes '/' even with intervening spaces,
     resulting in some weird line breaks [2].

So the UAX 14 algorithm should be taken with some reservation.

  2. If we're breaking in places that aren't spaces, we need to
     prioritize break points. It doesn't need to be complex, but it
     needs to be there, or we'll be breaking things like "s/he" and
     the "-a" Jukka mentions in [2].

     A simple prioritizing algorithm like the one outlined in [3]
     would suffice. (Though in the context of Mozilla, it may not be
     quite so simple. ;)

[1] http://www.cs.tut.fi/~jkorpela/unicode/linebr.html
[2] http://lists.w3.org/Archives/Public/www-style/2003May/0014.html
    (Fifth reply section, with /usr/spool example.)
[3] http://lists.w3.org/Archives/Public/www-style/2003May/0010.html
    ("As a simplistic example...")

~fantasai






Reply via email to