Hayden Muhl created LUCENE-5211: ----------------------------------- Summary: StopFilterFactory does not honor comments Key: LUCENE-5211 URL: https://issues.apache.org/jira/browse/LUCENE-5211 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.2 Reporter: Hayden Muhl
The StopFilterFactory builds a CharArraySet directly from the raw lines of the supplied words file. This causes a problem when using the stop word files supplied with the Solr/Lucene distribution. In particular, the comments in those files get added to the CharArraySet. A line like this... ceci | this Should result in the string "ceci" being added to the CharArraySet, but "ceci | this" is what actually gets added. Workaround: Remove all comments from stop word files you are using. Suggested fix: The StopFilterFactory should strip any comments, then strip trailing whitespace. The stop word files supplied with the distribution should be edited to conform to the supported comment format. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org