Hi all,

if you want to index Word files or have been doing so for a time,
there is a new parse_word_doc.pl at:

http://www.st.hhs.nl/htdig/parse_word_doc.pl.txt

features: code speedup (mucho!)
          matching patterns didn't work. now they match .,';: etc
               at the beginning or end of a word, not when in between.
               so   endings. is changed to endings but 1,234,777.99
               stays that way... this is nice when you have URL's
               in your document.

You need catdoc to run this scheme. See the code.

--jesse
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.

Reply via email to