Hi Mathieu, >From the class comment for ShingleFilter:
This filter handles position increments > 1 by inserting filler tokens (tokens with termtext "_"). It does not handle a position increment of 0. You could use feature this by setting (in an upstream filter) the positionIncrement of each sentence-starting word be at least as large as the maximum shingle size. This would result in sentence-ending shingles like ". _" and sentence-beginning shingles like "_ Word". Steve On 04/06/2008 at 1:23 PM, Mathieu Lecarme wrote: > The newly ShingleFilter is very helpful to fetch group of words, but > it doesn't handle ponctuation or any separation. > If you feed it with multiple sentences, you will get shingle that > start in one sentences and end in the next. > In order to avoid that, you can handle token positions, if there is > more than one char with the previous token, it should be punctation > (or typo). > Any suggestions to handle only shingle in the same sentence? > > M. > > --------------------------------------------------------------------- To > unsubscribe, e-mail: [EMAIL PROTECTED] For > additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]