On Monday, February 17, 2003, at 05:49 PM, Adam Brown wrote:

"womans" does not appear int the db.wordlist nor does it appear in the -vvvv
output (see the attached rundig output). I have run this with both the
default 'valid_punctuation' and my customised one.

Can't work out why it would be indexing "womans" as woman, why does it trim
the "s"?
Based on the output you attached, it appears that the problem is one of encoding. The pages are not using an ASCII character for the apostrophe. Instead, the pages are using a Microsoft Windows Latin-1 extension (a hex 92). I guess you either need to fix the pages or try writing a hex 92 into your valid_punctuation attribute. I am not sure if a real hex 92 will have an adverse affect on the parse. If there is a way to encode such characters in the attribute, I am not aware of it.

Jim



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html


Reply via email to