Re: Unicode compliant Line Breaking

Simon Pepping Tue, 01 Nov 2005 13:22:25 -0800

On Mon, Oct 31, 2005 at 03:25:12PM +0800, Manuel Mall wrote:
> In a previous post Joerg pointed to the Unicode Standard Annex #14 on 
> Line Breaking (http://www.unicode.org/reports/tr14/) and his initial 
> implementation: http://people.apache.org/~pietsch/linebreak.tar.gz.
> 
> I had since a closer look at both UAX#14 and Joerg's code. Because I 
> liked what I saw I went about adapting Joerg's code it to Unicode 4.1 
> and added fairly extensive JUnit test cases to it mainly because it 
> really helps to go through the various different cases mentioned in the 
> spec in some structured fashion.


Is our current hyphenation method a subset of Unicode's method?

> Assuming now that this will be agreed as well the next step would be the 
> more detailed design of the integration. But this is well beyond the 
> scope of this e-mail as there are some tricky issues involved and they 
> probably need to be tackled in conjunction with the white space 
> handling issues. Many of the problems are related to our LayoutManager 
> structures which create barriers when it comes to the need to process 
> character sequences across those boundaries as is the case for both 
> line breaking and white space handling. Add to that the design of the 

I seem to recall that the hyphenation code collects words across LM
boundaries.

It seems a useful goal to implement Unicode hyphenation. But since it
is a major effort, it does not fit in working towards a release. In
any case it would have to be in a separate branch until it proves to
work and to implement a substantial part of hyphenation. Then it does
not immediately matter if it is a separate project or a part of FOP.

Simon

-- 
Simon Pepping
home page: http://www.leverkruid.nl

Re: Unicode compliant Line Breaking

Reply via email to