Just ran this method on 4500 words ending in "s" in my index and results looks good but I'm tempted to remove this line:
!word.endsWith("ses") )
With it removed I saw 3 oddities moses=mose gases=gase viruses=viruse but I got 100+ extra stems that were OK:
Stemming doesn't have to produce intelligible words, it's the lemmatization that does. As long as the stem is unique, and all inflected forms of a single base form map to the same stem, it's ok.
In the case above the probability of another word producing the same stem "mose" is very low, so this stem is ok, too.
-- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]