[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597156#action_12597156 ]
Hiroaki Kawai commented on LUCENE-1224: --------------------------------------- About test code: I'm not going to say that "I'm right". I just wanted to address the issue and share what we should solve. If you don't like the code, please just tell me how I should do (the better way). I initially put the code there because I thought it was reasonable and proper, but I'm fine with changing it. {quote} For example, I think it makes sense to search for "th ex" as a phrase query {quote} For example, I think it makes sense to search for "example" as a phrase query instead. I want to address that NGramTokenizer is very useful for non-white-space-separated languages, for example Japanese. In that case, we won't search "th ex", because it assumes sentences are separated by whte space. I want to search by a fragment of a text sequence. I agree that this might be a big problem. IMHO, the issues comes from concept mismatch of TokenFilter and TermPosition. The discussion should moved to mailing-list? > NGramTokenFilter creates bad TokenStream > ---------------------------------------- > > Key: LUCENE-1224 > URL: https://issues.apache.org/jira/browse/LUCENE-1224 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* > Reporter: Hiroaki Kawai > Assignee: Grant Ingersoll > Priority: Critical > Attachments: LUCENE-1224.patch, NGramTokenFilter.patch, > NGramTokenFilter.patch > > > With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string > into an index, but I can't query it with "abc". If I query with "ab", I can > get a hit result. > The reason is that the NGramTokenFilter generates badly ordered TokenStream. > Query is based on the Token order in the TokenStream, that how stemming or > phrase should be anlayzed is based on the order (Token.positionIncrement). > With current filter, query string "abc" is tokenized to : ab bc abc > meaning "query a string that has ab bc abc in this order". > Expected filter will generate : ab abc(positionIncrement=0) bc > meaning "query a string that has (ab|abc) bc in this order" > I'd like to submit a patch for this issue. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]