[jira] [Comment Edited] (LUCENE-6462) Latin Stemmer for lucene

Uwe Schindler (JIRA) Mon, 11 May 2015 11:44:20 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538398#comment-14538398
 ]


Uwe Schindler edited comment on LUCENE-6462 at 5/11/15 6:43 PM:
----------------------------------------------------------------

Hi Niki,
the Latin Stemmer was originally written by Markus Klose. I would prefer to 
submit your patch to his Github repository: https://github.com/scherziglu/solr

If Markus wants to donate his whole code (including the TokenFilters) to 
Lucene, he should do this on himself to provide proper attribution to his work. 
The stemmer alone (as attached to this issue) is not so helpful.

In general stemmers should not necessarily produce "correct" forms, they should 
just "normalize" terms to something which can be compared with other terms 
during query execution. So before making changes to stemmers it is very 
important to test those changes with a corpus of latin texts and and compare 
the results of queries on them. For search engines, stemmers should also be 
light (so not to remove too much information).

In addition, this code has several problems: Why does it lookup the -que forms 
in a List instead of a CharArraySet?


was (Author: thetaphi):
Hi Niki,
the Latin Stemmer was originally written by Markus Klose. I would prefer to 
submit your patch to his Github repository: https://github.com/scherziglu/solr

If Markus wants to donate his whole code (including the TokenFilters) to 
Lucene. The stemmer alone (as attached to this issue) is not so helpful.

In general stemmers should not necessarily produce "correct" forms, they should 
just "normalize" terms to something which can be compared with other terms 
during query execution. So before making changes to stemmers it is very 
important to test those changes with a corpus of latin texts and and compare 
the results of queries on them. For search engines, stemmers should also be 
light (so not to remove too much information).

In addition, this code has several problems: Why does it lookup the -que forms 
in a List instead of a CharArraySet?

> Latin Stemmer for lucene
> ------------------------
>
>                 Key: LUCENE-6462
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6462
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Niki
>         Attachments: LatinStemmer.java
>
>
> In the latest lucene package there is no stemmer for Latin language. I have a 
> stemmer for latin language which is a rule based program based on the grammar 
> and rules of Latin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-6462) Latin Stemmer for lucene

Reply via email to