you can make your own analyzer, or do something like the below at query-time.
QueryParser queryParser = new QueryParser(Version.LUCENE_30, "myfieldname" , new PositionHackAnalyzerWrapper(new StandardAnalyzer(Version.LUCENE_30))); public class PositionHackAnalyzerWrapper extends Analyzer { Analyzer wrapped; public PositionHackAnalyzerWrapper(Analyzer wrapped) { this.wrapped = wrapped; } @Override public TokenStream tokenStream(String fieldName, Reader reader) { TokenStream ts = wrapped.tokenStream(fieldName, reader); return new PositionFilter(ts); } } 2010/7/1 Kolhoff, Jacqueline - ENCOWAY <kolh...@encoway.de> > How can I add this PositionFilter? I can't see anything in the API. I use > lucene version 3.0.1, this is my query parser: > > QueryParser queryParser = new QueryParser(Version.LUCENE_30, "myfieldname" > , new StandardAnalyzer(Version.LUCENE_30)); > > -----Ursprüngliche Nachricht----- > Von: Robert Muir [mailto:rcm...@gmail.com] > Gesendet: Donnerstag, 1. Juli 2010 12:34 > An: java-user@lucene.apache.org > Betreff: Re: Lucene and Chinese language > > This is a bug in the queryparser. ( > https://issues.apache.org/jira/browse/LUCENE-2458) > > the problem has nothing to do with your choice of analyzer, it has to do > with how the query is formed. > > Currently the queryparser uses a convoluted algorithm involving whitespace > (and not just the double quote operator as you would expect) to form phrase > queries. So, queries like this with no whitespace form phrase queries > always. > > The only workaround for reasonably good results consists of two steps: > 1. at query time (only!) add a > org.apache.lucene.analysis.position.PositionFilter (from contrib/analyzers) > to your analyzer. don't do this at index-time, just query-time! > 2. this will make all terms in the query "synonyms" of each other to bypass > this problem, but will screw up scoring, so you might want to also extend > QueryParser in a custom way: > > @Override > protected BooleanQuery newBooleanQuery(boolean disableCoord) { > // intentionally ignore disabled > // coord() factor from the PositionFilter hack. > return new BooleanQuery(false); > } > > 2010/7/1 Kolhoff, Jacqueline - ENCOWAY <kolh...@encoway.de> > > > > > Hi! > > > > We are using lucene in our project to search through information objects > > which works fine. For indexing we use the StandardAnalyzer. > > Now, we have to support the Chinese language. I found out that the > Chinese > > words and letters are correctly saved in the index but the query to > search > > for them does not work. Example: in English language the query is “text” > > which we parse to “*text*”. If we search for Chinese words / phrases like > > “佛山东方书城”the query is “*佛山东方书城*“ but there are no search results. If the > > query places blanks between the single letters / symbols like this “*佛 山 > 东 方 > > 书 城*“ we are getting results. Does the StandardAnalyzer interpret each > > Chinese letter as one word? What are best practices for this case? Shall > we > > use another analyzer (Chinese analyzer)? Or is it better to replace the > > query parser in this case? > > > > Regards, > > Jacqueline. > > > > > > -- > Robert Muir > rcm...@gmail.com > -- Robert Muir rcm...@gmail.com