I found an issue which adds the isLetter() check on FrenchLightStemmer. https://issues.apache.org/jira/browse/LUCENE-4063
Seems the same change has not been applied to FrenchMinimalStemmer, would it be a good idea that we add the same check to it to avoid too aggressive stemming? Tomoko 2019年7月27日(土) 20:29 Tomoko Uchida <tomoko.uchida.1...@gmail.com>: > > Hi Adrien, > > To me, it sounds simply a bug. Can you please open a JIRA (with a > patch if possible)? > > Tomoko > > 2019年7月23日(火) 22:05 Adrien Gallou <adriengal...@gmail.com>: > > > > Hi, > > > > I'm using both light and minimal French stemmers and encountered an issue > > when using the minimal stemmer. > > > > The light stemmer removes the last character of a word if the last two > > characters are identical. > > We can see that here: > > https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java#L263 > > In this light stemmer, there is a check to avoid altering the token if the > > token is a number. > > > > The minimal stemmer also removes the last character of a word if the last > > two characters are identical. > > We can see that here: > > https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java#L77 > > > > But in this minimal stemmer there is no check to see if the character is a > > letter or not. > > So when we have numeric tokens with the last two characters identical they > > are altered. > > > > Is there a reason for this? > > Should I file an issue on Jira to add this check? > > > > Thanks, > > > > Adrien Gallou --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org