On May 22, 2005, at 1:53 PM, Steve Legrand wrote:
Does the java-version of Snowball employ Porter or Porter2 stemming
algorithm in its EnglishStemmer available from the Lucene Sandbox?
If it is Porter2, I should get the word "his" indexed as "his" not
as "hi" as it does at the moment.
I don't know the specifics of which algorithm, but there are three
different SnowballAnalyzer stemmers for English - "English", "Lovins"
and "Porter. I just ran each of the English stemmers with the
AnalyzerDemo and got this output analyzing the string "his hiss
history":
SnowballAnalyzer: // English
[his] [hiss] [histori]
SnowballAnalyzer: // Lovins
[his] [his] [history]
SnowballAnalyzer: // Porter
[hi] [hiss] [histori]
Only the "Lovins" one does what seems to be the right thing with
"his", except that it does a bad job with words like "country" and
"countries".
Erik