ShingleFilter: don't output all-filler shingles/unigrams; also, convert from 
TermAttribute to CharTermAttribute
---------------------------------------------------------------------------------------------------------------

                 Key: LUCENE-2400
                 URL: https://issues.apache.org/jira/browse/LUCENE-2400
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/analyzers
    Affects Versions: 3.0.1
            Reporter: Steven Rowe
            Priority: Minor


When the input token stream to ShingleFilter has position increments greater 
than one, filler tokens are inserted for each position for which there is no 
token in the input token stream.  As a result, unigrams (if configured) and 
shingles can be filler-only.  Filler-only output tokens make no sense - these 
should be removed.

Also, because TermAttribute has been deprecated in favor of CharTermAttribute, 
the patch will also convert TermAttribute usages to CharTermAttribute in 
ShingleFilter.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to