Re: [htdig-dev] Porter-type Stemming Algorithms

Gilles Detillieux Wed, 09 Jan 2002 12:22:36 -0800

According to Neal Richter:
> > But keep in mind that the way the current Endings algorithm works is
> > slightly different than most "stemming" approaches. All words are indexed
> > as-is and then at search time, the fuzzy algorithm can add additional
> > "fuzzy query words" to the user query (at usually lower weight).
> 
>       Ah.. I was thinking of feeding the indexer pre-stemmed text as
> well as the original text.  User queries would by default be stemmed
> before quering, unless "" or the + are used.
> 
>       This method is pretty common in the IR research
> community.  Although it is work menthining that there is some controversy
> over whether stemming increases or decreases accuracy in general.  
> Probably very dataset and desired results dependent...


All the more reason to stick with the existing htfuzzy framework, so that
stemming can be turned on/off at search time.  The actual stemming technique
can be applied either way - whether it's done at indexing time (within
htdig) or just after indexing by htfuzzy.  The framework used by algorithms
like soundex, metaphone and accents could be applied equally well to the
stemming algorithms, so I'd recommend studying these algorithms to figure
out how to add the new stemming code to the mix.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Re: [htdig-dev] Porter-type Stemming Algorithms

Reply via email to