Re: Initial Word 6/95 support

Nick Burch Fri, 02 Jul 2010 15:33:15 -0700

On Sat, 3 Jul 2010, Antoni Mylka wrote:

Did you try to compare the Word6Extractor against the one from textmining? How well does it extract text?

For my test documents, the POI code now does better than text mining does.This is because we process not just the CHPX character properties, butalso the PAPX (paragraph) and SECX (section) tables too, which means weknow where the paragraphs are.

(However, we don't support decompressing the PAPX/CHPX properties, so youcan't tell how a text run is formatted, only that it's different. Ifanyone cares, you'll need to figure out the differences in the style tablebetween the old and the new format)

BTW, http://code.google.com/p/text-mining/ contains examples offastsaved files you could use in your tests, they probably can't becommitted to ASF for legal reasons (can they????), but they make greattests nonetheless.

The text mining library is now LGPL, so we can't commit their test filesto POI. If you fancy trying one of their sample fastsaved word 6 or 95files with Word6Extractor, I'd be interested to hear how it goes!


Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Initial Word 6/95 support

Reply via email to