On 4/9/2012 at 3:06 PM, [email protected] wrote: > LUCENE-3969: [...] tenatively add posLen to ShingleFilter > [...] > +++ lucene/dev/branches/lucene3969/modules/analysis/common/src/java/org/ > +++ apache/lucene/analysis/shingle/ShingleFilter.java Mon Apr 9 19:05:47 > 2012 > [...] > @@ -319,6 +321,8 @@ public final class ShingleFilter extends > noShingleOutput = false; > } > offsetAtt.setOffset(offsetAtt.startOffset(), > nextToken.offsetAtt.endOffset()); > + // nocommit is this right!? i'm just guessing... > + posLenAtt.setPositionLength(builtGramSize); > isOutputHere = true; > gramSize.advance(); > tokenAvailable = true;
+1 - looks right to me. builtGramSize is the position length of the output shingle - missing positions (e.g. from stop words) are represented as "filler" tokens. Steve
