According to Evaldas Imbrasas: > > Apparently there's something very wierd happening with scoring of the > > prefix fuzzy searches. You don't mention what your locale is set to, > > but if you use a locale in which the decimal point is a comma, you may ... > However, I found the parameter that is really affecting this thing - > it's multimatch_factor. My setting was 10. I tried changing it to 1000 > or 5, but with same effect as above. However, when I set it to 1, the > problem dissapears. What strange is that all my test searches included > only one fuzzy keyword, and docs say that multimatch_factor has effect > only when 'OR' method is used for search. Does it automatically expands > 'keyword*' to 'keyword1 OR keyword2 OR...' or something? > > I found the solution that works for me - just set multimatch_factor to > 1. Hope this helps you to debug the code though...
Yes, that does shed some light on things. Fuzzy match algorithms do indeed expand queries just as you say (you can see that in LOGICAL_WORDS if you display that template variable in your results, as the default templates do), so they will trigger the multimatch_factor. I wasn't wild about the implementation of this attribute because it was a bit of a hack. The problem is that with each additional match that's ORed in, the cummulative score is remultiplied by the factor, so the score increases at a phenomenal rate. I suspect that with prefix matching, which can yield lots of matches, the score is overflowing. I've given some thought to a more robust implementation of this attribute that will increase the score linearly rather than geometrically. I don't have time to code it just yet, so it may have to wait a week or so. A possible problem with this new implementation, though, is that it will require a new count field in the DocMatch class to keep track of how many matching words each document has. This will increase DocMatch from 12 to 16 bytes (plus overhead), so htsearch will end up consuming a lot more memory on sites with lots of documents, whether you use this feature or not. I'm not wild about that, but I don't see any other way to implement this cleanly. I should also rewrite the description for this attribute, as it's a bit misleading. It works not only when the 'OR' method is used for searches, but for anything that does ORing of results, including fuzzy expansions and boolean queries. I did also find some problems in htdig's handling of location values that may, in some cases, cause negative weights for words, but then you'd see those in your db.wordlist. I'll try to tackle those when I make the necessary changes to WordList::Word() to support location_factor. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

