In a previous post Joerg pointed to the Unicode Standard Annex #14 on
Line Breaking (http://www.unicode.org/reports/tr14/) and his initial
I had since a closer look at both UAX#14 and Joerg's code. Because I
liked what I saw I went about adapting Joerg's code it to Unicode 4.1
and added fairly extensive JUnit test cases to it mainly because it
really helps to go through the various different cases mentioned in the
spec in some structured fashion.
The results are now available for public inspection:
1. I would like to propose that Unicode conformant line breaking be
integrated into FOP trunk because it:
a) Moves FOP more towards being a universal formatter and not just a
formatter for western languages
b) Moves FOP more towards becoming a high quality typesetting system
(something that was really started by integrating Knuth style breaking)
The reason I think this needs to be voted on is because Unicode line
breaking will in subtle ways change the current line breaking behaviour
and therefore constitutes a (significant) change in FOPs overall
2. I would also like to propose that the Unicode conformant line
breaking be implemented using our own pair-table based implementation
and not using Java's line breaker, because:
a) It gives us full control and allows FOP to follow the Unicode
standard (and its updates and erratas) closely and therefore keep FOPs
Unicode compliance level independent of the Java version.
b) It allows us to tailor the algorithm to match the needs of XSL-FO and
c) It allows us to provide user customisation features (down the track)
not available through using the Java APIs.
Of course there are downsides, like:
a) Are we falling for the 'not invented here' syndrome?
b) Duplicating code which is already in the Java base system
c) Increasing the memory footprint of FOP
3. Assuming we get enough +1 for the above proposals the first item to
decide after that would be: Where should the code live?
a) Joerg would like to see it in Jakarta Commons but hasn't got the time
to start the project.
b) Jeremias suggested XMLGraphics Commons.
c) Personally I think it is too early to factor it out. More experience
with its design and use cases should be gathered before making it
standalone and at this point in time it really only are 2 core Java
classes. I would like to suggest that it initially lives under FOP in
something like org.apache.fop.text. Should the need and energy levels
(= developer enthusiasm) become available later to make this into an
Jakarta Commons or XMLGraphics Commons project so be it.
Assuming now that this will be agreed as well the next step would be the
more detailed design of the integration. But this is well beyond the
scope of this e-mail as there are some tricky issues involved and they
probably need to be tackled in conjunction with the white space
handling issues. Many of the problems are related to our LayoutManager
structures which create barriers when it comes to the need to process
character sequences across those boundaries as is the case for both
line breaking and white space handling. Add to that the design of the
different Knuth sequences required to model the different break cases
in conjunction with conditional border/padding and white space removal
around line breaking and different types of line justifications and
there is some real work ahead.
Should add my votes: