Word documents with FastSave enabled contain the original document and then deltas to 
the document until the deltas exceed a certain size and then they are merged back into 
the document. that means that unless you run the deltas, you won't know what the 
actual final contents are.

Herb....

-----Original Message-----
From: Ben Litchfield [mailto:[EMAIL PROTECTED]
Sent: Thursday, October 30, 2003 2:49 PM
To: Lucene Users List
Subject: Re: Exotic format indexing?


Unfortunately, it is not quite so easy.  I am not sure about Word
documents but PDFs usually have there contents compressed so a raw
"fishing" around for text would be pointless.  Your best bet is to use a
package like the one from textmining.org that handles various formats for
you.

Ben

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to