On Monday, February 17, 2003, at 05:49 PM, Adam Brown wrote:
"womans" does not appear int the db.wordlist nor does it appear in the -vvvvBased on the output you attached, it appears that the problem is one of encoding. The pages are not using an ASCII character for the apostrophe. Instead, the pages are using a Microsoft Windows Latin-1 extension (a hex 92). I guess you either need to fix the pages or try writing a hex 92 into your valid_punctuation attribute. I am not sure if a real hex 92 will have an adverse affect on the parse. If there is a way to encode such characters in the attribute, I am not aware of it.
output (see the attached rundig output). I have run this with both the
default 'valid_punctuation' and my customised one.
Can't work out why it would be indexing "womans" as woman, why does it trim
the "s"?
Jim
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

