I think it is not the matter what ngram in genral. NGramTokenFilter is a TokenFilter, and this produce a TRICKY token stream because it is processed more than one tokenizer.
This discussion is about the mechanism of tokenFilter itself. The NGramTokenFilter creates a so tricky token stream in the current implementation that one might be consider that is a new version of n-gram. The token stream genrerated by NGramTokenFilter is processed not only by n-gram tokenizer but also a mixture of the other tokenizers, so the token stream might not look like a normal n-gram. I think Grant is talking about StandardTokenizer + NGramTokenFilter, isn't? Grant Ingersoll <[EMAIL PROTECTED]> wrote: > On May 16, 2008, at 11:13 AM, Hiroaki Kawai wrote: > > > I think LUCENE-1224 is more complex than LUCENE-1225. > > > > First, I want to solve LUCENE-1225. It might be more > > simple to understand. > > > > For LUCENE-1224, I came to the same issue. My current > > understanding is this comes from mismatch of TokenFilter and position. > > I apologyze for that the patch is confusing. I'm aware that the patch > > still has another issue. > > The patch itself isn't confusing, IMO (the only issue with the patch > is the unit test, but that is for the JIRA discussion). I think it > does what it says it does. This discussion is more philosophical as > to what kinds of things people want to do with ngrams in general. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]