[GitHub] [lucene] xaviersanchez commented on pull request #461: LUCENE-10248: Spanish Plural Stemmer

GitBox Mon, 29 Nov 2021 07:21:04 -0800


xaviersanchez commented on pull request #461:
URL: https://github.com/apache/lucene/pull/461#issuecomment-981735988



   > Hi @xaviersanchez, this contribution looks great.
   > 
   > I'll do another pass on review and give some time for others to review as 
well.
   > 
   > I did a little investigation at a glance, and I think it is confusing that 
the current `SpanishMinimalStemmer` is doing aggressive conversions such as `ñ 
-> n`. I think, as a followup issue, we should `@deprecate` the 
`SpanishMinimalStemmer` and point users to this one instead?
   > 
   > `SpanishMinimalStemmer` is not a typical "upstream" algorithm, with 
academic papers/study from snowball or savoy, and there doesn't seem to be any 
reason to keep it anymore, except for a legacy index. So we could keep it 
around for another major release or so but not forever, IMO.
   
   Thanks @rmuir for the comment! 
   
   Yes, I agree we could deprecate SpanishMinimalStemmer and point the users to 
this implementation since it can cover the same use cases. We implemented this 
a while ago so, before contributing our code, we did the analysis of the 
different behaviors of the Spanish stemmers just for checking we could provide 
some added value. From our analysis we see that SpanishMinimalStemmer has some 
issues and does some quite aggressive text normalization. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] xaviersanchez commented on pull request #461: LUCENE-10248: Spanish Plural Stemmer

Reply via email to