> Finally, a while back, somebody on this list mentioned quiet a > different approach: simply read the raw binary document and go fishing > for what looks like text. I would like to try that :)
I have tried that approach and it works ok. You end up with a bunch of junk in with the useful stuff. It can clutter up your index and make searching slower. There are a lot of file formats that don't store all of the text as sequential text so it won't work. PDF is one, I know that PowerPoint is another. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
