Nicer text extraction, although, this package still has problems with many
Word documents. These are the errors often produced:
java.io.FileNotFoundException: no such entry: "0Table"
at
org.apache.poi.poifs.filesystem.DirectoryNode.getEntry(DirectoryNode.java:28
2)
...and...
java.lang.ArrayIndexOutOfBoundsException
at
org.apache.poi.hdf.extractor.Utils.convertBytesToShort(Utils.java:83)
Every other crawled .doc raise one of these errors.
/Ronnie
> -----Ursprungligt meddelande-----
> Fr�n: Ryan Ackley [mailto:[EMAIL PROTECTED]]
> Skickat: den 31 januari 2003 13:13
> Till: Lucene Users List
> �mne: THIS IS HOW YOU INDEX WORD DOCUMENTS
>
>
> I wrote the apache POI HDF (Word library) stuff. I wrote a light version
> that just does text extraction. You can download it at
> http://www.textmining.org.
>
> Ryan Ackley
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]