According to Evaldas Imbrasas:
> > Apparently there's something very wierd happening with scoring of the
> > prefix fuzzy searches.  You don't mention what your locale is set to,
> > but if you use a locale in which the decimal point is a comma, you may
...
> However, I found the parameter that is really affecting this thing -
> it's multimatch_factor. My setting was 10. I tried changing it to 1000
> or 5, but with same effect as above. However, when I set it to 1, the
> problem dissapears. What strange is that all my test searches included
> only one fuzzy keyword, and docs say that multimatch_factor has effect
> only when 'OR' method is used for search. Does it automatically expands
> 'keyword*' to 'keyword1 OR keyword2 OR...' or something?
> 
> I found the solution that works for me - just set multimatch_factor to
> 1. Hope this helps you to debug the code though...

Yes, that does shed some light on things.  Fuzzy match algorithms do
indeed expand queries just as you say (you can see that in LOGICAL_WORDS
if you display that template variable in your results, as the default
templates do), so they will trigger the multimatch_factor.  I wasn't
wild about the implementation of this attribute because it was a bit
of a hack.  The problem is that with each additional match that's ORed
in, the cummulative score is remultiplied by the factor, so the score
increases at a phenomenal rate.  I suspect that with prefix matching,
which can yield lots of matches, the score is overflowing.

I've given some thought to a more robust implementation of this attribute
that will increase the score linearly rather than geometrically.  I don't
have time to code it just yet, so it may have to wait a week or so.
A possible problem with this new implementation, though, is that it
will require a new count field in the DocMatch class to keep track of
how many matching words each document has.  This will increase DocMatch
from 12 to 16 bytes (plus overhead), so htsearch will end up consuming
a lot more memory on sites with lots of documents, whether you use this
feature or not.  I'm not wild about that, but I don't see any other way
to implement this cleanly.

I should also rewrite the description for this attribute, as it's a bit
misleading.  It works not only when the 'OR' method is used for searches,
but for anything that does ORing of results, including fuzzy expansions
and boolean queries.

I did also find some problems in htdig's handling of location values
that may, in some cases, cause negative weights for words, but then
you'd see those in your db.wordlist.  I'll try to tackle those when I
make the necessary changes to WordList::Word() to support location_factor.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to