Hayden Muhl created LUCENE-5211:
-----------------------------------

             Summary: StopFilterFactory does not honor comments
                 Key: LUCENE-5211
                 URL: https://issues.apache.org/jira/browse/LUCENE-5211
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/search
    Affects Versions: 4.2
            Reporter: Hayden Muhl


The StopFilterFactory builds a CharArraySet directly from the raw lines of the 
supplied words file. This causes a problem when using the stop word files 
supplied with the Solr/Lucene distribution. In particular, the comments in 
those files get added to the CharArraySet. A line like this...

ceci           |  this

Should result in the string "ceci" being added to the CharArraySet, but "ceci   
        |  this" is what actually gets added.

Workaround: Remove all comments from stop word files you are using.

Suggested fix: The StopFilterFactory should strip any comments, then strip 
trailing whitespace. The stop word files supplied with the distribution should 
be edited to conform to the supported comment format.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to