integrate snowball stopword lists
---------------------------------

                 Key: LUCENE-2206
                 URL: https://issues.apache.org/jira/browse/LUCENE-2206
             Project: Lucene - Java
          Issue Type: New Feature
          Components: contrib/analyzers
            Reporter: Robert Muir
             Fix For: 3.1


The snowball project creates stopword lists as well as stemmers, example: 
http://svn.tartarus.org/snowball/trunk/website/algorithms/english/stop.txt?view=markup

This patch includes the following:
* snowball stopword lists for 13 languages in contrib/snowball/resources
* all stoplists are unmodified, only added license header and converted each 
one from whatever encoding it was in to UTF-8
* added getSnowballWordSet  to WordListLoader, this is because the format of 
these files is very different, for example it supports multiple words per line 
and embedded comments.

I did not add any changes to SnowballAnalyzer to actually automatically use 
these lists yet, i would like us to discuss this in a future issue proposing 
integrating snowball with contrib/analyzers.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to