I found an issue which adds the isLetter() check on FrenchLightStemmer.
https://issues.apache.org/jira/browse/LUCENE-4063

Seems the same change has not been applied to FrenchMinimalStemmer,
would it be a good idea that we add the same check to it to avoid too
aggressive stemming?

Tomoko

2019年7月27日(土) 20:29 Tomoko Uchida <tomoko.uchida.1...@gmail.com>:
>
> Hi Adrien,
>
> To me, it sounds simply a bug. Can you please open a JIRA (with a
> patch if possible)?
>
> Tomoko
>
> 2019年7月23日(火) 22:05 Adrien Gallou <adriengal...@gmail.com>:
> >
> > Hi,
> >
> > I'm using both light and minimal French stemmers and encountered an issue
> > when using the minimal stemmer.
> >
> > The light stemmer removes the last character of a word if the last two
> > characters are identical.
> > We can see that here:
> > https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java#L263
> > In this light stemmer, there is a check to avoid altering the token if the
> > token is a number.
> >
> > The minimal stemmer also removes the last character of a word if the last
> > two characters are identical.
> > We can see that here:
> > https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java#L77
> >
> > But in this minimal stemmer there is no check to see if the character is a
> > letter or not.
> > So when we have numeric tokens with the last two characters identical they
> > are altered.
> >
> > Is there a reason for this?
> > Should I file an issue on Jira to add this check?
> >
> > Thanks,
> >
> > Adrien Gallou

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to