On Oct 30, 2003, at 20:48, Ben Litchfield wrote:


Unfortunately, it is not quite so easy.  I am not sure about Word
documents

The raw text is visible.


but PDFs usually have there contents compressed

Yep. PDF is really an image format ;)


so a raw
"fishing" around for text would be pointless.

That's alright. I can handle PDF separately if the need arise.


Your best bet is to use a
package like the one from textmining.org that handles various formats for
you.

Perhaps. But I'm only looking for a "good enough" solution, not a perfect one :)


Cheers,

PA.


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to