Massimo Pasquini created LUCENE-6138:
----------------------------------------

             Summary: ItalianLightStemmer
                 Key: LUCENE-6138
                 URL: https://issues.apache.org/jira/browse/LUCENE-6138
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 4.10.2
            Reporter: Massimo Pasquini
            Priority: Minor


I expect a stemmer to transform nouns in their singular and plural forms into a 
shorter common form. The implementation of the ItalianLightStemmer doesn't 
apply any stemming to words shorter then 6 characters in length. This leads to 
some annoying results:

singular form | plural form
4|5 chars in length (no stemming)
alga -> alga | alghe -> alghe
fuga -> fuga | fughe -> fughe
lega -> lega | leghe -> leghe
5|6 chars in length (stemming only on plural form)
vanga -> vanga | vanghe -> vang
verga -> verga | verghe -> verg

I suppose that such limitation on words length is to avoid other side effects 
on shorter words not in the set above, but I think something must be reviewed 
in the code for better results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to