Re: NGrams and positions

Hiroaki Kawai Fri, 16 May 2008 08:49:57 -0700

I think it is not the matter what ngram in genral.

NGramTokenFilter is a TokenFilter, and this produce a
TRICKY token stream because it is processed more than 
one tokenizer.


This discussion is about the mechanism of tokenFilter
itself.

The NGramTokenFilter creates a so tricky token 
stream in the current implementation that one might be 
consider that is a new version of n-gram.

The token stream genrerated by NGramTokenFilter is 
processed not only by n-gram tokenizer but also a
mixture of the other tokenizers, so the token stream
might not look like a normal n-gram.

I think Grant is talking about StandardTokenizer + NGramTokenFilter, 
isn't?


Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> On May 16, 2008, at 11:13 AM, Hiroaki Kawai wrote:
> 
> > I think LUCENE-1224 is more complex than LUCENE-1225.
> >
> > First, I want to solve LUCENE-1225. It might be more
> > simple to understand.
> >
> > For LUCENE-1224, I came to the same issue. My current
> > understanding is this comes from mismatch of TokenFilter and position.
> > I apologyze for that the patch is confusing. I'm aware that the patch
> > still has another issue.
> 
> The patch itself isn't confusing, IMO (the only issue with the patch  
> is the unit test, but that is for the JIRA discussion).  I think it  
> does what it says it does.  This discussion is more philosophical as  
> to what kinds of things people want to do with ngrams in general.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: NGrams and positions

Reply via email to