According to Neal Richter: > > But keep in mind that the way the current Endings algorithm works is > > slightly different than most "stemming" approaches. All words are indexed > > as-is and then at search time, the fuzzy algorithm can add additional > > "fuzzy query words" to the user query (at usually lower weight). > > Ah.. I was thinking of feeding the indexer pre-stemmed text as > well as the original text. User queries would by default be stemmed > before quering, unless "" or the + are used. > > This method is pretty common in the IR research > community. Although it is work menthining that there is some controversy > over whether stemming increases or decreases accuracy in general. > Probably very dataset and desired results dependent...
All the more reason to stick with the existing htfuzzy framework, so that stemming can be turned on/off at search time. The actual stemming technique can be applied either way - whether it's done at indexing time (within htdig) or just after indexing by htfuzzy. The framework used by algorithms like soundex, metaphone and accents could be applied equally well to the stemming algorithms, so I'd recommend studying these algorithms to figure out how to add the new stemming code to the mix. -- Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
