[
https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427350#comment-13427350
]
Robert Muir commented on LUCENE-4284:
-------------------------------------
Really all these analyzers are just simple examples and not intended to solve
all use cases.
You can just make your own that doesnt lowercase at all with hardly any code,
and
if you want to control case sensitivity of the stopword set, again do this on
your stopset itself
(pass the boolean to StopFilter.makeStopSet etc).
{noformat}
Analyzer a = new ReusableAnalyzerBase() {
protected TokenStreamComponents createComponents(String fieldName, Reader
reader) {
Tokenizer source = new LetterTokenizer(matchVersion, reader);
return new TokenStreamComponents(source, new StopFilter(matchVersion,
source, stopwords));
}
};
{noformat}
Otherwise we have to implement options to all Analyzers for everyones possible
usecases,
which is too many (we will never make everyone happy).
> RFE: stopword filter without lowercase side-effect
> --------------------------------------------------
>
> Key: LUCENE-4284
> URL: https://issues.apache.org/jira/browse/LUCENE-4284
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Sam Halliday
> Priority: Minor
>
> It would appear that accept()-time lowercasing of Tokens is not favourable
> anymore, due to the @Deprecation of the only constructor in StopFilter that
> allows this.
> Please support some way to allow stop-word removal without lowercasing the
> output:
> http://stackoverflow.com/questions/11777785
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]