RE: [htdig] a flaw in search algorithm (synonyms)

Quim Sanmarti Tue, 18 Sep 2001 01:03:29 -0700
> -----Mensaje original-----
> De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]] En nombre de Gilles
Detillieux
> Enviado el: lunes, 17 de septiembre de 2001 19:01
> Para: Alexander I. Lebedev
> CC: [EMAIL PROTECTED]
[snip]
> Yes, this is a known limitation of the current fuzzy match algorithms.
> Fuzzy matches are only applied directly to the original search words,
> and not to the fuzzy match words of other algorithms.  The
> same problem
> exists with results from the endings algorithm not being also
> processed
> by the accents algorithm.
>
> I think the solution in general would be to run the fuzzy algorithms
> iteratively until no new search words are generated.  These
> iterations may
> only be necessary for dictionary-based algorithms, if these
> are processed
> before any word database-based algorithms.  I'm not certain
> of this last
> point - it just occurred to me.  Certainly, though, some sort
> of iterative
> process would be needed.  I've given this some thought before, but I
> don't think it's quite as easy as it sounds to implement this
> reliably.
>

I also feel that the eventual evolution of fuzzy handling should be somewhat
more flexible that Alexander's proposal. Not everybody will want to expand
words like that.
Take in account that extensive, uncontrolled word expansion may induce
unexpected semantic drifts to the resulting query, that will probably
receive nonsensical responses. The expanded queries risk becoming an ORed
list of relatively unrelated words.
It sounds good to iterate queries through fuzzy expanders, but perhaps the
exact combination of these sould be able to be tailorable to each concrete
language, domain and intended use. A default policy can be defined, but I
believe that the full thing should be configurable.

--
Quim



_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html
RE: [htdig] a flaw in search algorithm (synonyms)

Reply via email to