[ 
https://issues.apache.org/jira/browse/LUCENE-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538398#comment-14538398
 ] 

Uwe Schindler commented on LUCENE-6462:
---------------------------------------

Hi Niki,
the Latin Stemmer was originally written by Markus Klose. I would prefer to 
submit your patch to his Github repository: https://github.com/scherziglu/solr

If Markus wants to donate his whole code (including the TokenFilters) to 
Lucene. The stemmer alone (as attached to this issue) is not so helpful.

In general stemmers should not necessarily produce "correct" forms, they should 
just "normalize" terms to something which can be compared with other terms 
during query execution. So before making changes to stemmers it is very 
important to test those changes with a corpus of latin texts and and compare 
the results of queries on them. For search engines, stemmers should also be 
light (so not to remove too much information).

In addition, this code has several problems: Why does it lookup the -que forms 
in a List instead of a CharArraySet?

> Latin Stemmer for lucene
> ------------------------
>
>                 Key: LUCENE-6462
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6462
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Niki
>         Attachments: LatinStemmer.java
>
>
> In the latest lucene package there is no stemmer for Latin language. I have a 
> stemmer for latin language which is a rule based program based on the grammar 
> and rules of Latin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to