[
https://issues.apache.org/jira/browse/LUCENE-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427353#comment-13427353
]
Sam Halliday commented on LUCENE-4284:
--------------------------------------
OK, thanks. Actually all I needed was to remove stop words from a String, so
the following did the trick
{noformat}
Set<Object> stops = StopFilter.makeStopSet(Version.LUCENE_36,
Lists.newArrayList(StopAnalyzer.ENGLISH_STOP_WORDS_SET), true);
Tokenizer tokeniser = new ClassicTokenizer(Version.LUCENE_36, new
StringReader(text));
StopFilter stopFilter = new StopFilter(Version.LUCENE_36,
tokeniser, stops);
List<String> words = Lists.newArrayList();
try {
while (stopFilter.incrementToken()) {
String token =
stopFilter.getAttribute(CharTermAttribute.class).toString();
words.add(token);
}
} catch (IOException ex) {
throw new GuruMeditationFailure();
}
{noformat}
The API is a bit of a labyrinth - it'll take me some time to understand many of
the design decisions.
> RFE: stopword filter without lowercase side-effect
> --------------------------------------------------
>
> Key: LUCENE-4284
> URL: https://issues.apache.org/jira/browse/LUCENE-4284
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Sam Halliday
> Priority: Minor
>
> It would appear that accept()-time lowercasing of Tokens is not favourable
> anymore, due to the @Deprecation of the only constructor in StopFilter that
> allows this.
> Please support some way to allow stop-word removal without lowercasing the
> output:
> http://stackoverflow.com/questions/11777785
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]