Hi, I'm using both light and minimal French stemmers and encountered an issue when using the minimal stemmer.
The light stemmer removes the last character of a word if the last two characters are identical. We can see that here: https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchLightStemmer.java#L263 In this light stemmer, there is a check to avoid altering the token if the token is a number. The minimal stemmer also removes the last character of a word if the last two characters are identical. We can see that here: https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/apache/lucene/analysis/fr/FrenchMinimalStemmer.java#L77 But in this minimal stemmer there is no check to see if the character is a letter or not. So when we have numeric tokens with the last two characters identical they are altered. Is there a reason for this? Should I file an issue on Jira to add this check? Thanks, Adrien Gallou