Hi Adrien, I find i might make a mistake: There is 2 level processing in a Analyzer class: one is Tokenizer, which is HMMChineseTokenizer, and the other is Analyzer which may apply some filtering... I'm using lucene's default interface to set a Analyzer instance to do the indexing, but i'm using the Tokenizer to parse raw query text to build the Query. The wierd thing is, there is a lucene query-parser module, but it will deal with some meta syntax like AND/OR filedName:xxx, so i think it cannot directly deal with the raw query text? But when i try to use the upper Analyzer.tokenStream() to parse separate terms from raw query text, i get the very confusing api: TokenStream has no clear interface to get the terms(filtered tokens), but the Attribute concept, which is used only in lucene internals. Where can i find a sample code to extract the filtered tokens from the TokenStream interface?
Adrien Grand <jpou...@gmail.com> 于2020年1月10日周五 下午4:53写道: > It should match. My guess is that you might not reusing the same positions > as set by the analysis chain when creating the phrase query? Can you show > us how you build the phrase query? > > On Fri, Jan 10, 2020 at 9:24 AM 小鱼儿 <ctengc...@gmail.com> wrote: > > > I use SmartChineseAnalyzer to do the indexing, and add a document with a > > TextField whose value is a long sentence, when anaylized, will get 18 > > terms. > > > > & then i use the same value to construct a PhraseQuery, setting slop to > 2, > > and adding the 18 terms concequently... > > > > I expect the search api to find this document, but it returns empty. > > > > Where am i wrong? > > > > > -- > Adrien >