Hi All
As you might've seen from my commits in the last few days, I've added some
initial support to HWPF for word 6 and word 95 files. I've only been
working with a view to doing text extraction (so I can ditch the text
mining library from a work project). With lots of trial and error, some
offset tips from WV's FIB parsing code, and some refactoring, we can now
get text and paragraphs out of word 6 and word 95 files!
To play with this, you'll want HWPFOldDocument / Word6Extractor (catch
OldWordFileFormatException and switch to the old one as needed)
I've got this working with various sample files producing by doing save-as
from newer software. This means that it's not impossible that real Word 6
/ Word 95 files will break it, especially if they're quick-saved (I didn't
have any examples)
As usual, please upload files that don't work to new bugzilla entries, or
even better upload the broken file and the patch that fixes it :)
Nick
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]