DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUGĀ· RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=35456>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED ANDĀ· INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=35456 Summary: NGramFilter -- construct n-grams from a TokenStream Product: Lucene Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Analysis AssignedTo: [email protected] ReportedBy: [EMAIL PROTECTED] This filter constructs n-grams (token combinations up to a fixed size, sometimes called "shingles") from a token stream. The filter sets start offsets, end offsets and position increments, so highlighting and phrase queries should work. Position increments > 1 in the input stream are replaced by filler tokens (tokens with termText "_" and endOffset - startOffset = 0) in the output n-grams. (Position increments > 1 in the input stream are usually caused by removing some tokens, eg. stopwords, from a stream.) The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache Commons-Collections. Filter, test case and an analyzer are attached. -- Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
