integrate snowball stopword lists --------------------------------- Key: LUCENE-2206 URL: https://issues.apache.org/jira/browse/LUCENE-2206 Project: Lucene - Java Issue Type: New Feature Components: contrib/analyzers Reporter: Robert Muir Fix For: 3.1
The snowball project creates stopword lists as well as stemmers, example: http://svn.tartarus.org/snowball/trunk/website/algorithms/english/stop.txt?view=markup This patch includes the following: * snowball stopword lists for 13 languages in contrib/snowball/resources * all stoplists are unmodified, only added license header and converted each one from whatever encoding it was in to UTF-8 * added getSnowballWordSet to WordListLoader, this is because the format of these files is very different, for example it supports multiple words per line and embedded comments. I did not add any changes to SnowballAnalyzer to actually automatically use these lists yet, i would like us to discuss this in a future issue proposing integrating snowball with contrib/analyzers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org