According to Alexander I. Lebedev:
> Gilles,
> 
> Thank you for your answer.

It was Geoff who answered you last time, but I'm sure you're welcome.  :-)

> >To quote from the documentation: (attrs.html#search_algorithm)
> > Each word is first reduced to its word root and then all known legal
> > endings are used for the matching.
> >
> >I think the bug basically comes up because there are some subset of
> >permuations that are also root words. In Endings::getWords, if a word is
> >already a root word, then it doesn't bother to check if it's also a
> >permutation.
> 
> I'm afraid, the origin of the bug is different.  I tested your idea on
> one indexed Russian site (26,000 documents) and found the same bug
> in the case when the word I'm searching for is not a root itself (but
> have two different roots).  So I guess, the program stops searching
> when it finds the first occurence of the word, not all of them. (Indeed,
> in Endings::getWords I don't see the loop that tests if there are other
> roots.)
> 
> - Alexander

What Geoff is describing as a bug in the Endings algorithm is actually a
deliberate change, submitted by Steve Arlow back in June 1999.  It was to
prevent the -ness suffix from being stripped on words like witness, and then
having the the word "wit" expanded with a number of inappropriate suffixes.
That change was incorporated in version 3.1.3.  However, it does indeed
appear to be the cause of the problem with the whole "skate" vs. "skater"
test.

I checked htsearch from 3.1.2, and its unpatched endings algorithm produced
the same results with "skate" or "skater", i.e.

   (skater or skate or skated or skating or skates or skaters)

So, either the problem you've run into with Russian words is different than
the skater test, or going back to 3.1.2 will solve the problem for you.

In either case, I think Steve Arlow's patch is more far-reaching than any
of us thought before.  It seems to me that this should be optional, unless
we can find a smarter way of doing this.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to