Re: Word files & Build vs. Buy?

Nick Burch Thu, 09 Feb 2006 04:29:21 -0800

On Thu, 9 Feb 2006, Christiaan Fluit wrote:

My experience is that the WordDocument class crashes on about 25% of thedocuments, i.e. it throws some sort of Exception. I've tested POI2.5.1-final as well as the current code in CVS, but both produce thisresult. I even suspect the output to be 100% the same, but I haven'tverified this.

You could try using org.apache.poi.hwpf.HWPFDocument, and getting therange, then the paragraphs, and grab the text from each paragraph. Ifthere's interest, I could probably commit an extractor that does this topoi.

(WordDocument is from the hdf package, which is older and less reliablethan the current hwpf stuff)

Another reason I don't like this class is that it operates on anInputStream and internally creates a POIFSFileSystem which you cannotaccess, so that it becomes hard to extract document metadata as well(for which you need the PFSFS) without buffering the entire InputStream.

If you're using HWPFDocument from cvs, then you can create that from aPOIFSFileSystem.


Nick

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Word files & Build vs. Buy?

Reply via email to