ShingleFilter: don't output all-filler shingles/unigrams; also, convert from
TermAttribute to CharTermAttribute
---------------------------------------------------------------------------------------------------------------
Key: LUCENE-2400
URL: https://issues.apache.org/jira/browse/LUCENE-2400
Project: Lucene - Java
Issue Type: Improvement
Components: contrib/analyzers
Affects Versions: 3.0.1
Reporter: Steven Rowe
Priority: Minor
When the input token stream to ShingleFilter has position increments greater
than one, filler tokens are inserted for each position for which there is no
token in the input token stream. As a result, unigrams (if configured) and
shingles can be filler-only. Filler-only output tokens make no sense - these
should be removed.
Also, because TermAttribute has been deprecated in favor of CharTermAttribute,
the patch will also convert TermAttribute usages to CharTermAttribute in
ShingleFilter.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]