[
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hiroaki Kawai updated LUCENE-1224:
----------------------------------
Attachment: LUCENE-1224.patch
Patch updated with unit test.
LUCENE-1225 is easier to understand this problem. This patch also includes
token filter issues that is more complicated.
> NGramTokenFilter creates bad TokenStream
> ----------------------------------------
>
> Key: LUCENE-1224
> URL: https://issues.apache.org/jira/browse/LUCENE-1224
> Project: Lucene - Java
> Issue Type: Bug
> Components: contrib/*
> Reporter: Hiroaki Kawai
> Assignee: Grant Ingersoll
> Priority: Critical
> Attachments: LUCENE-1224.patch, NGramTokenFilter.patch,
> NGramTokenFilter.patch
>
>
> With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string
> into an index, but I can't query it with "abc". If I query with "ab", I can
> get a hit result.
> The reason is that the NGramTokenFilter generates badly ordered TokenStream.
> Query is based on the Token order in the TokenStream, that how stemming or
> phrase should be anlayzed is based on the order (Token.positionIncrement).
> With current filter, query string "abc" is tokenized to : ab bc abc
> meaning "query a string that has ab bc abc in this order".
> Expected filter will generate : ab abc(positionIncrement=0) bc
> meaning "query a string that has (ab|abc) bc in this order"
> I'd like to submit a patch for this issue. :-)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]