On Oct 30, 2003, at 20:48, Ben Litchfield wrote:
Unfortunately, it is not quite so easy. I am not sure about Word documents
The raw text is visible.
but PDFs usually have there contents compressed
Yep. PDF is really an image format ;)
so a raw "fishing" around for text would be pointless.
That's alright. I can handle PDF separately if the need arise.
Your best bet is to use a
package like the one from textmining.org that handles various formats for
you.
Perhaps. But I'm only looking for a "good enough" solution, not a perfect one :)
Cheers,
PA.
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
