[ 
https://issues.apache.org/jira/browse/LUCENE-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801833#action_12801833
 ] 

Robert Muir commented on LUCENE-2055:
-------------------------------------

Now that we have snowball tests, I started looking at integrating snowball and 
deprecating this custom code. 
So I ran the snowball tests against these hand-coded algorithms to see what the 
differences are... remember they all claim to implement porter:

* RussianStemFilter one passes 100% all snowball tests.

* DutchStemFilter passes 98.9% of snowball tests. all bugs were in handling of 
double consonants:
examples:
aangetroffen -> aangetrof expected: aangetroff
afvoerbonnen -> afvoerbon expected: afvoerbonn
klommen -> klom expected: klomm

* FrenchStemFilter only passes 93.92% of snowball tests. but if you put 
lowercasefilter after it, it passes 99.66%!
The problem is this stemmer incorrectly creates some uppercase stems from 
lowercase words. examples:
  xviii -> xviI expected: xvii
  vouer -> voU expected: vou
  tranquille -> tranqUill expected: tranquill



> Remove duplicate analysis functionality
> ---------------------------------------
>
>                 Key: LUCENE-2055
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2055
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>             Fix For: 3.1
>
>
> would like to mark the following code deprecated, so it can be removed.
> * analyzers/fr: all except ElisionFilter, this is unrelated and standalone.
> * analyzers/nl:entire package
> * analyzers/ru: entire package
> below are excerpts from this code where they proudly proclaim they use the 
> snowball algorithm.
> I think we should delete all of this code in favor of the actual snowball 
> package.
> {noformat}
> /**
>  * A stemmer for French words. 
>  * <p>
>  * The algorithm is based on the work of
>  * Dr Martin Porter on his snowball project<br>
>  * refer to http://snowball.sourceforge.net/french/stemmer.html<br>
>  * (French stemming algorithm) for details
>  * </p>
>  */
> public class FrenchStemmer {
> /**
>  * A stemmer for Dutch words. 
>  * <p>
>  * The algorithm is an implementation of
>  * the <a 
> href="http://snowball.tartarus.org/algorithms/dutch/stemmer.html";>dutch 
> stemming</a>
>  * algorithm in Martin Porter's snowball project.
>  * </p>
>  */
> public class DutchStemmer {
> /**
>  * Russian stemming algorithm implementation (see 
> http://snowball.sourceforge.net for detailed description).
>  */
> class RussianStemmer
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to