Rob Kremer's bits of Thu, 18 Jul 2002 translated to: >I am trying to determine if I have an incorrect configuration or if there is a >bug. When I search for an exact phrase, it will return some matches that don't >have the exact match, such as searching for "htdig exclude" on the >http://www.htdig.org site. It will return 6 matches, only 2 have exact matches >in them, the others are two ChangeLog files and two FAQ files. Any ideas?
I believe that this is expected behavior. The word positions are computed after parsing terms from the original text. That means you need to account what happens with numbers, words less that the specified minimum length, characters that break strings of text into word parts, etc. In this specific case, I think the extra hits you are seeing are due to the following bits of text. > htdig/htdig.cc: If exclude_urls In this case, htdig.cc is split into htdig and cc. The cc is dropped because it has less than three characters. The : is tossed out for obvious reasons. The 'If' is less than three characters. The exclude_url is split into exclude and url. The net result is that you end up with an htdig next to an exclude. > htdig/Retriever.cc,htdig/htdig.cc: "exclude_urls" Basically the same as the above. > htdig&restrict=&exclude This occurs in both FAQ pages. This ends up being parsed as htdig&restrict exclude, where htdig&restrict is treated as a two part term (i.e. both htdig and restrict are given the same word position). So the result is that 'htdig' has a word position adjacent to 'exclude'. Jim ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

