[ https://issues.apache.org/jira/browse/LUCENE-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated LUCENE-2503: -------------------------------- Attachment: LUCENE-2503.patch I updated the patch, I think this is ready to go: * added finnish * created vocabulary tests from reference C,perl,whatever impls, and found/fixed bugs in every language but en,pt,fr (as promised in my last comment) * created a VocabularyAssert junit util class, and refactored the existing snowball,porter,german,and russian tests to use it, too. * refactored a bunch of utility stuff that was duplicated everywhere such as endsWith()/delete() and put it in StemmerUtil. to apply the patch, first apply the patch itself, then please unzip the zip file containing vocabulary tests (LUCENE-2503_modules_analysis_testdata.zip) from the modules/analysis/common dir. if no one objects, i'll commit in a few days. > light/minimal stemming for euro languages > ----------------------------------------- > > Key: LUCENE-2503 > URL: https://issues.apache.org/jira/browse/LUCENE-2503 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/analyzers > Affects Versions: 3.1, 4.0 > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Minor > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2503.patch, LUCENE-2503.patch > > > The snowball stemmers are very aggressive and it would be nice if there were > lighter alternatives. > Some applications may want to perform less aggressive stemming, for example: > http://www.lucidimagination.com/search/document/5d16391e21ca6faf/plural_only_stemmer > Good, relevance tested algorithms exist and I think we should provide these > alternatives. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org