Re: Unicode compliant Line Breaking

Jeremias Maerki Mon, 07 Nov 2005 03:05:49 -0800

1. +1
2. +1
3.b) +1 for the separatable parts although c) is also ok for now.


+1 to try to find synergies with the code in Batik.

If I were you I'd create a branch and put your stuff in there. It's
easier for everyone to follow and to help (wishful thinking).

On 31.10.2005 08:25:12 Manuel Mall wrote:
> In a previous post Joerg pointed to the Unicode Standard Annex #14 on 
> Line Breaking (http://www.unicode.org/reports/tr14/) and his initial 
> implementation: http://people.apache.org/~pietsch/linebreak.tar.gz.
> 
> I had since a closer look at both UAX#14 and Joerg's code. Because I 
> liked what I saw I went about adapting Joerg's code it to Unicode 4.1 
> and added fairly extensive JUnit test cases to it mainly because it 
> really helps to go through the various different cases mentioned in the 
> spec in some structured fashion.
> 
> The results are now available for public inspection: 
> http://people.apache.org/~manuel/fop/linebreak.tar.gz
> 
> 1. I would like to propose that Unicode conformant line breaking be 
> integrated into FOP trunk because it:
> a) Moves FOP more towards being a universal formatter and not just a 
> formatter for western languages
> b) Moves FOP more towards becoming a high quality typesetting system 
> (something that was really started by integrating Knuth style breaking)
> The reason I think this needs to be voted on is because Unicode line 
> breaking will in subtle ways change the current line breaking behaviour 
> and therefore constitutes a (significant) change in FOPs overall 
> rendering.
> 
> 2. I would also like to propose that the Unicode conformant line 
> breaking be implemented using our own pair-table based implementation 
> and not using Java's line breaker, because:
> a) It gives us full control and allows FOP to follow the Unicode 
> standard (and its updates and erratas) closely and therefore keep FOPs 
> Unicode compliance level independent of the Java version.
> b) It allows us to tailor the algorithm to match the needs of XSL-FO and 
> FOP.
> c) It allows us to provide user customisation features (down the track) 
> not available through using the Java APIs.
> 
> Of course there are downsides, like:
> a) Are we falling for the 'not invented here' syndrome?
> b) Duplicating code which is already in the Java base system
> c) Increasing the memory footprint of FOP
> 
> 3. Assuming we get enough +1 for the above proposals the first item to 
> decide after that would be: Where should the code live?
> a) Joerg would like to see it in Jakarta Commons but hasn't got the time 
> to start the project. 
> b) Jeremias suggested XMLGraphics Commons. 
> c) Personally I think it is too early to factor it out. More experience 
> with its design and use cases should be gathered before making it 
> standalone and at this point in time it really only are 2 core Java 
> classes. I would like to suggest that it initially lives under FOP in 
> something like org.apache.fop.text. Should the need and energy levels 
> (= developer enthusiasm) become available later to make this into an 
> Jakarta Commons or XMLGraphics Commons project so be it.
> 
> Assuming now that this will be agreed as well the next step would be the 
> more detailed design of the integration. But this is well beyond the 
> scope of this e-mail as there are some tricky issues involved and they 
> probably need to be tackled in conjunction with the white space 
> handling issues. Many of the problems are related to our LayoutManager 
> structures which create barriers when it comes to the need to process 
> character sequences across those boundaries as is the case for both 
> line breaking and white space handling. Add to that the design of the 
> different Knuth sequences required to model the different break cases 
> in conjunction with conditional border/padding and white space removal 
> around line breaking and different types of line justifications and 
> there is some real work ahead.
> 
> Cheers
> 
> Manuel
> 
> Should add my votes:
> 
> 1.) +1
> 2.) +1
> 3.c) +1



Jeremias Maerki

Re: Unicode compliant Line Breaking

Reply via email to