Hi All

As you might've seen from my commits in the last few days, I've added some initial support to HWPF for word 6 and word 95 files. I've only been working with a view to doing text extraction (so I can ditch the text mining library from a work project). With lots of trial and error, some offset tips from WV's FIB parsing code, and some refactoring, we can now get text and paragraphs out of word 6 and word 95 files!

To play with this, you'll want HWPFOldDocument / Word6Extractor (catch OldWordFileFormatException and switch to the old one as needed)

I've got this working with various sample files producing by doing save-as from newer software. This means that it's not impossible that real Word 6 / Word 95 files will break it, especially if they're quick-saved (I didn't have any examples)

As usual, please upload files that don't work to new bugzilla entries, or even better upload the broken file and the patch that fixes it :)

Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to