Hi Mathieu,

>From the class comment for ShingleFilter:

  This filter handles position increments > 1 by inserting
  filler tokens (tokens with termtext "_"). It does not
  handle a position increment of 0.

You could use feature this by setting (in an upstream filter) the 
positionIncrement of each sentence-starting word be at least as large as the 
maximum shingle size.  This would result in sentence-ending shingles like ". _" 
and sentence-beginning shingles like "_ Word".

Steve

On 04/06/2008 at 1:23 PM, Mathieu Lecarme wrote:
> The newly ShingleFilter is very helpful to fetch group of words, but
> it doesn't handle ponctuation or any separation.
> If you feed it with multiple sentences, you will get shingle that
> start in one sentences and end in the next.
> In order to avoid that, you can handle token positions, if there is
> more than one char with the previous token, it should be punctation
> (or typo).
> Any suggestions to handle only shingle in the same sentence?
> 
> M.
> 
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: [EMAIL PROTECTED] For
> additional commands, e-mail: [EMAIL PROTECTED]
> 
>

 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to