Add the ability to KStemmer to preserve the original token when stemming
------------------------------------------------------------------------
Key: SOLR-3231
URL: https://issues.apache.org/jira/browse/SOLR-3231
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Jamie Johnson
Attachments: KStemFilter.patch
While using the PorterStemmer, I found that there were often times that it was
far to aggressive in it's stemming. In my particular case it is unrealistic to
provide a protected word list which captures all possible words which should
not be stemmed. To avoid this I proposed a solution whereby we store the
original token as well as the stemmed token so exact searches would always
work. Based on discussions on the mailing list Ahmet Arslan, I believe the
attached patch to KStemmer provides the desired capabilities through a
configuration parameter. This largely is a copy of the
org.apache.lucene.wordnet.SynonymTokenFilter.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]