[
https://issues.apache.org/jira/browse/SOLR-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man resolved SOLR-5212.
----------------------------
Resolution: Not A Problem
Naomi: if you questions/confusion/problems using Solr, please ask on the
solr-user mailing list and only file Bugs once there is confirmation of a
problem in solr itself.
In particular your initial report is confusing for a few reasons...
1) you mentioned the value of "qs" is being set based on the number bigrams --
however there isn't anything in your comments to suggest anything even remotely
related to the "qs" param is coming into play here. "qs" specifies the query
slop property of any phrase queries created due to explicit phrase queries in
the input query string -- nohting in our example input or example debug output
suggests any PhraseQueries are ever getting built.
2) the number you seem to be commenting on in each case is the minNrShouldMatch
on each of hte top level BooleanQueries produced from your input -- since your
configured mm is {{6<-1 6<90%}} the smallest minNrShouldMatch value that will
every be programatically assigned is "6", but all of your example queries have
less then 6 clauses, so instead the minNrShouldMatch used in each case is the
total number of query clauses -- ie: in each case, wherey you have N "SHOULD"
clauses in the final query, all N clauses must match.
---
Please start a thread on the solr-user mailing list, providing all of the
details you included in this issue, along with some specifics about what you
expect/desire to have happen and how the actual behaior you are observing
differs from those expecations.
> bad qs and mm when using edismax for field with CJKBigramFilter
> ----------------------------------------------------------------
>
> Key: SOLR-5212
> URL: https://issues.apache.org/jira/browse/SOLR-5212
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 4.4
> Reporter: Naomi Dushay
> Priority: Critical
>
> When I have a field using CJKBigramFilter, a mysterious qs value (or what i
> take as qs, because it shows as ~x after the first DisjunctionMaxQuery)
> appears in my parsed query. The qs value that appears is the minimum of:
> mm setting, number of bigrams in query string.
> This makes no sense, from a retrieval standpoint. It could possibly make
> sense to adjust the ps value, but certainly not the qs. Moreover, changing
> the mm setting via an HTTP param can affect the qs, but sending in a qs
> parameter has no effect on the qs in the parsed query.
> If I use a field in qf that has only bigrams, then qs is set to MIN(original
> mm setting, number of bigrams in query string)
> arg sent in: q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
> 旧小说 is 3 chars, so 2 bigrams
> debugQuery
> <str name="rawquerystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
> <str name="querystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_bi_search:旧小
> cjk_bi_search:小说)~2))~0.01) ())/no_coord</str>
> <str name="parsedquery_toString">+(((cjk_bi_search:旧小
> cjk_bi_search:小说)~2))~0.01 ()</str>
> If I use a field in qf that has only unigrams, then qs is set to MIN(original
> mm setting, number of unigrams in query string)
> arg sent in: q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
> 旧小说 is 3 chars, so 3 bigrams
> debugQuery
> <str name="rawquerystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
> <str name="querystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_uni_search:旧
> cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord</str>
> <str name="parsedquery_toString">+(((cjk_uni_search:旧 cjk_uni_search:小
> cjk_uni_search:说)~3))~0.01 ()</str>
> If I use a field in qf that has both bigrams and unigrams, then qs is set to
> MIN(original mm setting, number of bigrams + unigrams in query string).
> arg sent in: q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
> 旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5
> debugQuery
> <str name="rawquerystring">{!qf=cjk_both_pub_search pf= pf2=
> pf3=}旧小说</str>
> <str name="querystring">{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_both_search:旧
> cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
> cjk_both_search:说)~5))~0.01) ())/no_coord</str>
> <str name="parsedquery_toString">+(((cjk_both_search:旧
> cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
> cjk_both_search:说)~5))~0.01 ()</str>
> I am running Solr 4.4. I have fields defined like so:
> <fieldtype name="text_cjk_both" class="solr.TextField"
> positionIncrementGap="10000" autoGeneratePhraseQueries="false">
> <analyzer>
> <tokenizer class="solr.ICUTokenizerFactory" />
> <filter class="solr.CJKWidthFilterFactory"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Traditional-Simplified"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Katakana-Hiragana"/>
> <filter class="solr.ICUFoldingFilterFactory"/>
> <filter class="solr.CJKBigramFilterFactory" han="true"
> hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
> </analyzer>
> </fieldtype>
> <fieldtype name="text_cjk_bi" class="solr.TextField"
> positionIncrementGap="10000" autoGeneratePhraseQueries="false">
> <analyzer>
> <tokenizer class="solr.ICUTokenizerFactory" />
> <filter class="solr.CJKWidthFilterFactory"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Traditional-Simplified"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Katakana-Hiragana"/>
> <filter class="solr.ICUFoldingFilterFactory"/>
> <filter class="solr.CJKBigramFilterFactory" han="true"
> hiragana="true" katakana="true" hangul="true" outputUnigrams="false" />
> </analyzer>
> </fieldtype>
> <fieldtype name="text_cjk_uni" class="solr.TextField"
> positionIncrementGap="10000" autoGeneratePhraseQueries="false">
> <analyzer>
> <tokenizer class="solr.ICUTokenizerFactory" />
> <filter class="solr.CJKWidthFilterFactory"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Traditional-Simplified"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Katakana-Hiragana"/>
> <filter class="solr.ICUFoldingFilterFactory"/>
> </analyzer>
> </fieldtype>
> The request handler uses edismax:
> <requestHandler name="search" class="solr.SearchHandler" default="true">
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="q.alt">*:*</str>
> <str name="mm">6<-1 6<90%</str>
> <int name="qs">1</int>
> <int name="ps">0</int>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]