[
https://issues.apache.org/jira/browse/LUCENE-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538398#comment-14538398
]
Uwe Schindler edited comment on LUCENE-6462 at 5/11/15 6:43 PM:
----------------------------------------------------------------
Hi Niki,
the Latin Stemmer was originally written by Markus Klose. I would prefer to
submit your patch to his Github repository: https://github.com/scherziglu/solr
If Markus wants to donate his whole code (including the TokenFilters) to
Lucene, he should do this on himself to provide proper attribution to his work.
The stemmer alone (as attached to this issue) is not so helpful.
In general stemmers should not necessarily produce "correct" forms, they should
just "normalize" terms to something which can be compared with other terms
during query execution. So before making changes to stemmers it is very
important to test those changes with a corpus of latin texts and and compare
the results of queries on them. For search engines, stemmers should also be
light (so not to remove too much information).
In addition, this code has several problems: Why does it lookup the -que forms
in a List instead of a CharArraySet?
was (Author: thetaphi):
Hi Niki,
the Latin Stemmer was originally written by Markus Klose. I would prefer to
submit your patch to his Github repository: https://github.com/scherziglu/solr
If Markus wants to donate his whole code (including the TokenFilters) to
Lucene. The stemmer alone (as attached to this issue) is not so helpful.
In general stemmers should not necessarily produce "correct" forms, they should
just "normalize" terms to something which can be compared with other terms
during query execution. So before making changes to stemmers it is very
important to test those changes with a corpus of latin texts and and compare
the results of queries on them. For search engines, stemmers should also be
light (so not to remove too much information).
In addition, this code has several problems: Why does it lookup the -que forms
in a List instead of a CharArraySet?
> Latin Stemmer for lucene
> ------------------------
>
> Key: LUCENE-6462
> URL: https://issues.apache.org/jira/browse/LUCENE-6462
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: Niki
> Attachments: LatinStemmer.java
>
>
> In the latest lucene package there is no stemmer for Latin language. I have a
> stemmer for latin language which is a rule based program based on the grammar
> and rules of Latin
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]