Naomi Dushay created SOLR-5212:
----------------------------------
Summary: bad qs and mm when using edismax for field with
CJKBigramFilter
Key: SOLR-5212
URL: https://issues.apache.org/jira/browse/SOLR-5212
Project: Solr
Issue Type: Bug
Components: search
Affects Versions: 4.4
Reporter: Naomi Dushay
Priority: Critical
When I have a field using CJKBigramFilter, a mysterious qs value appears in my
parsed query. The qs value that appears is the minimum of:
mm setting, number of bigrams in query string.
If I use a field in qf that has only bigrams, then qs is set to MIN(original mm
setting, number of bigrams in query string)
arg sent in: q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 2 bigrams
debugQuery
<str name="rawquerystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_bi_search:旧小
cjk_bi_search:小说)~2))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_bi_search:旧小
cjk_bi_search:小说)~2))~0.01 ()</str>
If I use a field in qf that has only unigrams, then qs is set to MIN(original
mm setting, number of unigrams in query string)
arg sent in: q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 3 bigrams
debugQuery
<str name="rawquerystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_uni_search:旧
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_uni_search:旧 cjk_uni_search:小
cjk_uni_search:说)~3))~0.01 ()</str>
If I use a field in qf that has both bigrams and unigrams, then qs is set to
MIN(original mm setting, number of bigrams + unigrams in query string).
arg sent in: q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5
debugQuery
<str name="rawquerystring">{!qf=cjk_both_pub_search pf= pf2=
pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_both_search:旧
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
cjk_both_search:说)~5))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_both_search:旧
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
cjk_both_search:说)~5))~0.01 ()</str>
I am running Solr 4.4. I have fields defined like so:
<fieldtype name="text_cjk_both" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true"
katakana="true" hangul="true" outputUnigrams="true" />
</analyzer>
</fieldtype>
<fieldtype name="text_cjk_bi" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true"
katakana="true" hangul="true" outputUnigrams="false" />
</analyzer>
</fieldtype>
<fieldtype name="text_cjk_uni" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
</analyzer>
</fieldtype>
The request handler uses edismax:
<requestHandler name="search" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="mm">6<-1 6<90%</str>
<int name="qs">1</int>
<int name="ps">0</int>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]