On Mon, 15 Dec 2003, Pleasant, Tracy wrote: > As a spinoff, I was wondering if anyone has been happy with indexing and > searching Word docs. What about reading the contents? Any problems?
In the scratchpad of POI is src/org/hdf/extractor, which has all the code you need to pull out the text of a word document. I use this and some simple HPSF code (to extract the document meta data) with Lucene, and it works great Nick --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
