On Mon, Oct 31, 2005 at 03:25:12PM +0800, Manuel Mall wrote: > In a previous post Joerg pointed to the Unicode Standard Annex #14 on > Line Breaking (http://www.unicode.org/reports/tr14/) and his initial > implementation: http://people.apache.org/~pietsch/linebreak.tar.gz. > > I had since a closer look at both UAX#14 and Joerg's code. Because I > liked what I saw I went about adapting Joerg's code it to Unicode 4.1 > and added fairly extensive JUnit test cases to it mainly because it > really helps to go through the various different cases mentioned in the > spec in some structured fashion.
Is our current hyphenation method a subset of Unicode's method? > Assuming now that this will be agreed as well the next step would be the > more detailed design of the integration. But this is well beyond the > scope of this e-mail as there are some tricky issues involved and they > probably need to be tackled in conjunction with the white space > handling issues. Many of the problems are related to our LayoutManager > structures which create barriers when it comes to the need to process > character sequences across those boundaries as is the case for both > line breaking and white space handling. Add to that the design of the I seem to recall that the hyphenation code collects words across LM boundaries. It seems a useful goal to implement Unicode hyphenation. But since it is a major effort, it does not fit in working towards a release. In any case it would have to be in a separate branch until it proves to work and to implement a substantial part of hyphenation. Then it does not immediately matter if it is a separate project or a part of FOP. Simon -- Simon Pepping home page: http://www.leverkruid.nl
