[ 
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597079#action_12597079
 ] 

Grant Ingersoll commented on LUCENE-1224:
-----------------------------------------

OK, let me change the comment.  You can test this problem without indexing and 
querying.  All of the information is available on the token.  I would suggest 
you revert the test to it's original and then modify testNGrams()  by adding 
asserts that check that the positionIncrement value is set properly.   By going 
the indexing/querying route, you are not only testing the token filters, but 
pretty much all of Lucene and are thus subject to any problems there.  In other 
words, it ain't a unit test.  If you set the posiitionIncrement properly and 
test for it, it will work in Lucene for the queries, etc.  If it doesn't, we 
have much bigger problems than ngrams.  That being said, if you want to fix 
testNgrams, and leave the query case in, that is fine by me.



> NGramTokenFilter creates bad TokenStream
> ----------------------------------------
>
>                 Key: LUCENE-1224
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1224
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/*
>            Reporter: Hiroaki Kawai
>            Assignee: Grant Ingersoll
>            Priority: Critical
>         Attachments: LUCENE-1224.patch, NGramTokenFilter.patch, 
> NGramTokenFilter.patch
>
>
> With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string 
> into an index, but I can't query it with "abc". If I query with "ab", I can 
> get a hit result.
> The reason is that the NGramTokenFilter generates badly ordered TokenStream. 
> Query is based on the Token order in the TokenStream, that how stemming or 
> phrase should be anlayzed is based on the order (Token.positionIncrement).
> With current filter, query string "abc" is tokenized to : ab bc abc 
> meaning "query a string that has ab bc abc in this order".
> Expected filter will generate : ab abc(positionIncrement=0) bc
> meaning "query a string that has (ab|abc) bc in this order"
> I'd like to submit a patch for this issue. :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to