Tom, please open an issue for this. On Fri, Aug 3, 2012 at 6:19 PM, Tom Burton-West <tburt...@umich.edu> wrote: > Hello all, > > About 10% of our queries that contain Han characters are single character > queries. It looks like the CJKBigram filter only outputs single characters > when there are no adjacent bigrammable characters in the input. This means > we have to create a separate field to index Han unigrams in order to address > single character queries and then write application code to search that > separate field if we detect a single character Han query. This is rather > kludgey. As an alternative approach to dealing with single character Han > queryies, would it be possible to add an optional flag to the > CJKBigramFilter to tell it to also output unigrams? > > That way on indexing we could set the flag so that both unigrams and bigrams > would be indexed. On querying we would not set the flag so that the current > logic which outputs bigrams unless there is a single Han character (in which > case that gets output) would take care of queries containing a single Han > unigram. > > This is somewhat analogus to the flags in LUCENE-1370 for the ShingleFilter. > > If this makes sense I'll open a JIRA issue. > > Tom Burton-West
-- lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org