On May 22, 2005, at 1:53 PM, Steve Legrand wrote:

Does the java-version of Snowball employ Porter or Porter2 stemming algorithm in its EnglishStemmer available from the Lucene Sandbox? If it is Porter2, I should get the word "his" indexed as "his" not as "hi" as it does at the moment.

I don't know the specifics of which algorithm, but there are three different SnowballAnalyzer stemmers for English - "English", "Lovins" and "Porter. I just ran each of the English stemmers with the AnalyzerDemo and got this output analyzing the string "his hiss history":

  SnowballAnalyzer:  // English
    [his] [hiss] [histori]

  SnowballAnalyzer:  // Lovins
    [his] [his] [history]

  SnowballAnalyzer:  // Porter
    [hi] [hiss] [histori]

Only the "Lovins" one does what seems to be the right thing with "his", except that it does a bad job with words like "country" and "countries".

    Erik

Reply via email to