[ https://issues.apache.org/jira/browse/LUCENE-2503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879941#action_12879941 ]
Robert Muir commented on LUCENE-2503: ------------------------------------- bq. Man are you fast! not really, i've been working it for a while but since someone asked i figure i would create the issue. testing isnt done, but english, french, portuguese I think are ok. the others need a lot of tests and probably have bugs. bq. Does the English one deal with women/ woman and foci / focus type stuff? Nope, the english one is the Harman "s-stemming" algorithm. its very simple: {noformat} if final is '-ies' but not '-eies' or '-aies' then replace '-ies' by '-y', return; if final is '-es' but not '-aes', '-ees' or '-oes' then replace '-es' by '-e', return; if final is '-s' but not '-us' or '-ss' then remove '-s'; return. {noformat} For special cases like you mentioned (if you want them), i would recommend adding these customizations yourself as documented here: http://wiki.apache.org/solr/LanguageAnalysis#Customizing_Stemming just make a tab-separated file of words-stems and put a StemmerOverrideFilter(Factory) before the stemmer in the stream. I think this alone provides a lot of flexibility. if it isn't enough, then i think these stemmers are much simpler to modify if you wanted to go that route also :) > light/minimal stemming for euro languages > ----------------------------------------- > > Key: LUCENE-2503 > URL: https://issues.apache.org/jira/browse/LUCENE-2503 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/analyzers > Affects Versions: 3.1, 4.0 > Reporter: Robert Muir > Assignee: Robert Muir > Priority: Minor > Fix For: 3.1, 4.0 > > Attachments: LUCENE-2503.patch > > > The snowball stemmers are very aggressive and it would be nice if there were > lighter alternatives. > Some applications may want to perform less aggressive stemming, for example: > http://www.lucidimagination.com/search/document/5d16391e21ca6faf/plural_only_stemmer > Good, relevance tested algorithms exist and I think we should provide these > alternatives. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org