[ https://issues.apache.org/jira/browse/LUCENE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800021#action_12800021 ]
Robert Muir commented on LUCENE-2206: ------------------------------------- I will commit this in a few days if no one objects. Again i add the getSnowballWordSet to WordListLoader, but if this is inappropriate we could instead have a SnowballWordListLoader in our snowball package or something, doesn't matter to me. > integrate snowball stopword lists > --------------------------------- > > Key: LUCENE-2206 > URL: https://issues.apache.org/jira/browse/LUCENE-2206 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/analyzers > Reporter: Robert Muir > Assignee: Robert Muir > Fix For: 3.1 > > Attachments: LUCENE-2206.patch > > > The snowball project creates stopword lists as well as stemmers, example: > http://svn.tartarus.org/snowball/trunk/website/algorithms/english/stop.txt?view=markup > This patch includes the following: > * snowball stopword lists for 13 languages in contrib/snowball/resources > * all stoplists are unmodified, only added license header and converted each > one from whatever encoding it was in to UTF-8 > * added getSnowballWordSet to WordListLoader, this is because the format of > these files is very different, for example it supports multiple words per > line and embedded comments. > I did not add any changes to SnowballAnalyzer to actually automatically use > these lists yet, i would like us to discuss this in a future issue proposing > integrating snowball with contrib/analyzers. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org