Naomi Dushay created SOLR-5212:
----------------------------------

             Summary: bad qs and mm when using edismax for field with 
CJKBigramFilter 
                 Key: SOLR-5212
                 URL: https://issues.apache.org/jira/browse/SOLR-5212
             Project: Solr
          Issue Type: Bug
          Components: search
    Affects Versions: 4.4
            Reporter: Naomi Dushay
            Priority: Critical


When I have a field using CJKBigramFilter, a mysterious qs value appears in my 
parsed query.  The qs value that appears is the minimum of:
  mm setting, number of bigrams in query string.

If I use a field in qf that has only bigrams, then qs is set to MIN(original mm 
setting, number of bigrams in query string)

arg sent in:    q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 2 bigrams

debugQuery
        <str name="rawquerystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
        <str name="querystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
        <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01) ())/no_coord</str>
        <str name="parsedquery_toString">+(((cjk_bi_search:旧小 
cjk_bi_search:小说)~2))~0.01 ()</str>


If I use a field in qf that has only unigrams, then qs is set to MIN(original 
mm setting, number of unigrams in query string)

arg sent in:    q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 bigrams

debugQuery
        <str name="rawquerystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
        <str name="querystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
        <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_uni_search:旧 
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord</str>
        <str name="parsedquery_toString">+(((cjk_uni_search:旧 cjk_uni_search:小 
cjk_uni_search:说)~3))~0.01 ()</str>


If I use a field in qf that has both bigrams and unigrams, then qs is set to 
MIN(original mm setting, number of bigrams + unigrams in query string). 

arg sent in:    q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
   旧小说   is 3 chars, so 3 unigrams + 2 bigrams = 5

debugQuery
        <str name="rawquerystring">{!qf=cjk_both_pub_search pf= pf2= 
pf3=}旧小说</str>
        <str name="querystring">{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说</str>
        <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01) ())/no_coord</str>
        <str name="parsedquery_toString">+(((cjk_both_search:旧 
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说 
cjk_both_search:说)~5))~0.01 ()</str>




I am running Solr 4.4.  I have fields defined like so:

    <fieldtype name="text_cjk_both" class="solr.TextField" 
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
     <analyzer>
        <tokenizer class="solr.ICUTokenizerFactory" />
          <filter class="solr.CJKWidthFilterFactory"/>
        <filter class="solr.ICUTransformFilterFactory" 
id="Traditional-Simplified"/>
        <filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
        <filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true" 
katakana="true" hangul="true" outputUnigrams="true" />
      </analyzer>
    </fieldtype>
    <fieldtype name="text_cjk_bi" class="solr.TextField" 
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
     <analyzer>
        <tokenizer class="solr.ICUTokenizerFactory" />
          <filter class="solr.CJKWidthFilterFactory"/>
        <filter class="solr.ICUTransformFilterFactory" 
id="Traditional-Simplified"/>
        <filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
        <filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true" 
katakana="true" hangul="true" outputUnigrams="false" />
      </analyzer>
    </fieldtype>
    <fieldtype name="text_cjk_uni" class="solr.TextField" 
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
     <analyzer>
        <tokenizer class="solr.ICUTokenizerFactory" />
          <filter class="solr.CJKWidthFilterFactory"/>
        <filter class="solr.ICUTransformFilterFactory" 
id="Traditional-Simplified"/>
        <filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
        <filter class="solr.ICUFoldingFilterFactory"/>
      </analyzer>
    </fieldtype>


The request handler uses edismax:

  <requestHandler name="search" class="solr.SearchHandler" default="true">
    <lst name="defaults">
      <str name="defType">edismax</str>
      <str name="q.alt">*:*</str>
      <str name="mm">6&lt;-1 6&lt;90%</str>
      <int name="qs">1</int>
      <int name="ps">0</int>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to