Hi all, if you want to index Word files or have been doing so for a time, there is a new parse_word_doc.pl at: http://www.st.hhs.nl/htdig/parse_word_doc.pl.txt features: code speedup (mucho!) matching patterns didn't work. now they match .,';: etc at the beginning or end of a word, not when in between. so endings. is changed to endings but 1,234,777.99 stays that way... this is nice when you have URL's in your document. You need catdoc to run this scheme. See the code. --jesse ---------------------------------------------------------------------- To unsubscribe from the htdig mailing list, send a message to [EMAIL PROTECTED] containing the single word "unsubscribe" in the body of the message.
