Thanks Robert, Opened:LUCENE-4286 <https://issues.apache.org/jira/browse/LUCENE-4286>
Tom On Fri, Aug 3, 2012 at 6:22 PM, Robert Muir <rcm...@gmail.com> wrote: > Tom, please open an issue for this. > > On Fri, Aug 3, 2012 at 6:19 PM, Tom Burton-West <tburt...@umich.edu> > wrote: > > Hello all, > > > > About 10% of our queries that contain Han characters are single character > > queries. It looks like the CJKBigram filter only outputs single > characters > > when there are no adjacent bigrammable characters in the input. This > means > > we have to create a separate field to index Han unigrams in order to > address > > single character queries and then write application code to search that > > separate field if we detect a single character Han query. This is rather > > kludgey. As an alternative approach to dealing with single character > Han > > queryies, would it be possible to add an optional flag to the > > CJKBigramFilter to tell it to also output unigrams? > > > > That way on indexing we could set the flag so that both unigrams and > bigrams > > would be indexed. On querying we would not set the flag so that the > current > > logic which outputs bigrams unless there is a single Han character (in > which > > case that gets output) would take care of queries containing a single Han > > unigram. > > > > This is somewhat analogus to the flags in LUCENE-1370 for the > ShingleFilter. > > > > If this makes sense I'll open a JIRA issue. > > > > Tom Burton-West > > > > -- > lucidimagination.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >