On Tuesday 18 February 2003 12:22, Jim Cole wrote: > On Monday, February 17, 2003, at 05:49 PM, Adam Brown wrote: > > "womans" does not appear int the db.wordlist nor does it appear in the > > -vvvv > > output (see the attached rundig output). I have run this with both the > > default 'valid_punctuation' and my customised one. > > > > Can't work out why it would be indexing "womans" as woman, why does it > > trim > > the "s"? > > Based on the output you attached, it appears that the problem is one of > encoding. The pages are not using an ASCII character for the > apostrophe. Instead, the pages are using a Microsoft Windows Latin-1 > extension (a hex 92). I guess you either need to fix the pages or try > writing a hex 92 into your valid_punctuation attribute. I am not sure > if a real hex 92 will have an adverse affect on the parse. If there is > a way to encode such characters in the attribute, I am not aware of it. > > Jim
Thanks for the suggestion however if you look at the 'title' meta tag for the page you will see that the word "womans" is used without an apostrophe and this instance is being indexed as "woman". It's got me stumped. Can someone point me to the correct location in the source where I can check what is going on. Ad ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

