Word documents with FastSave enabled contain the original document and then deltas to the document until the deltas exceed a certain size and then they are merged back into the document. that means that unless you run the deltas, you won't know what the actual final contents are.
Herb.... -----Original Message----- From: Ben Litchfield [mailto:[EMAIL PROTECTED] Sent: Thursday, October 30, 2003 2:49 PM To: Lucene Users List Subject: Re: Exotic format indexing? Unfortunately, it is not quite so easy. I am not sure about Word documents but PDFs usually have there contents compressed so a raw "fishing" around for text would be pointless. Your best bet is to use a package like the one from textmining.org that handles various formats for you. Ben --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
