On Tue, 1 Nov 2005 01:33 am, [EMAIL PROTECTED] wrote: > Hi all, > > Just an FYI, Batik also currently has an implementation of > the Unicode TR14 word breaking alg. > (org.apache.batik.gvt.flow.TextLineBreak). > > As far as performance is concerned it should be fairly fast > as it is mostly just table based. > Thomas, thanks for the pointer (Note to myself - need to become more aware of what's in the Batik code base. Feeble excuse - Joerg didn't seem to know either). Had a look at the Batik code: Same algorithm as Joerg wrote (not surprising as UAX#14 actually contains real C code) very similar data structures internally. Data structures are hard coded and not generated from the Unicode text files. The API is different, especially it relies on Batik specific types being passed across not just plain Strings (but this could probably be handled by a wrapper).
This probably strengthens the argument of making all of this part of XMLGraphics Common....grumble...grumble... My main reason for hesitation with the XMLGraphics Common approach is simple man power. We need to setup the infrastructure (subversion, mailing lists, web site, etc.). We need to maintain this. We would basically would publish APIs currently internal to Batik and FOP with all the resultant support headaches. For example, I would not like to see my time diluted in the moment by having to discuss API needs outside of FOP/Batik. Actually I am reluctant to even dive into the Batik code base in the moment. FOP is complicated enough to digest. Hmmm... not sure where to go from here. Manuel
