Re: Add flag to CJKBigramFilter to also output unigrams (Single character Han queries)

Robert Muir Fri, 03 Aug 2012 15:22:58 -0700

Tom, please open an issue for this.

On Fri, Aug 3, 2012 at 6:19 PM, Tom Burton-West <tburt...@umich.edu> wrote:
> Hello all,
>
> About 10% of our queries that contain Han characters are single character
> queries.   It looks like the CJKBigram filter only outputs single characters
> when there are no adjacent bigrammable characters in the input.   This means
> we have to create a separate field to index Han unigrams in order to address
> single character queries and then write application code to search that
> separate field if we detect a single character Han query.  This is rather
> kludgey.    As an alternative approach to dealing with single character Han
> queryies, would it be possible to add an optional  flag to the
> CJKBigramFilter to tell it to also output unigrams?
>
> That way on indexing we could set the flag so that both unigrams and bigrams
> would be indexed.  On querying we would not set the flag so that the current
> logic which outputs bigrams unless there is a single Han character (in which
> case that gets output) would take care of queries containing a single Han
> unigram.
>
> This is somewhat analogus to the flags in LUCENE-1370 for the ShingleFilter.
>
> If this makes sense I'll open a JIRA issue.
>
> Tom Burton-West




-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Add flag to CJKBigramFilter to also output unigrams (Single character Han queries)

Reply via email to