[
https://issues.apache.org/jira/browse/SOLR-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Naomi Dushay updated SOLR-5212:
-------------------------------
Description:
When I have a field using CJKBigramFilter, a mysterious qs value (or what i
take as qs, because it shows as ~x after the first DisjunctionMaxQuery) appears
in my parsed query. The qs value that appears is the minimum of:
mm setting, number of bigrams in query string.
This makes no sense, from a retrieval standpoint. It could possibly make sense
to adjust the ps value, but certainly not the qs.
If I use a field in qf that has only bigrams, then qs is set to MIN(original mm
setting, number of bigrams in query string)
arg sent in: q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 2 bigrams
debugQuery
<str name="rawquerystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_bi_search:旧小
cjk_bi_search:小说)~2))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_bi_search:旧小
cjk_bi_search:小说)~2))~0.01 ()</str>
If I use a field in qf that has only unigrams, then qs is set to MIN(original
mm setting, number of unigrams in query string)
arg sent in: q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 3 bigrams
debugQuery
<str name="rawquerystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_uni_search:旧
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_uni_search:旧 cjk_uni_search:小
cjk_uni_search:说)~3))~0.01 ()</str>
If I use a field in qf that has both bigrams and unigrams, then qs is set to
MIN(original mm setting, number of bigrams + unigrams in query string).
arg sent in: q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5
debugQuery
<str name="rawquerystring">{!qf=cjk_both_pub_search pf= pf2=
pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_both_search:旧
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
cjk_both_search:说)~5))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_both_search:旧
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
cjk_both_search:说)~5))~0.01 ()</str>
I am running Solr 4.4. I have fields defined like so:
<fieldtype name="text_cjk_both" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true"
katakana="true" hangul="true" outputUnigrams="true" />
</analyzer>
</fieldtype>
<fieldtype name="text_cjk_bi" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true"
katakana="true" hangul="true" outputUnigrams="false" />
</analyzer>
</fieldtype>
<fieldtype name="text_cjk_uni" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
</analyzer>
</fieldtype>
The request handler uses edismax:
<requestHandler name="search" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="mm">6<-1 6<90%</str>
<int name="qs">1</int>
<int name="ps">0</int>
was:
When I have a field using CJKBigramFilter, a mysterious qs value appears in my
parsed query. The qs value that appears is the minimum of:
mm setting, number of bigrams in query string.
If I use a field in qf that has only bigrams, then qs is set to MIN(original mm
setting, number of bigrams in query string)
arg sent in: q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 2 bigrams
debugQuery
<str name="rawquerystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_bi_search:旧小
cjk_bi_search:小说)~2))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_bi_search:旧小
cjk_bi_search:小说)~2))~0.01 ()</str>
If I use a field in qf that has only unigrams, then qs is set to MIN(original
mm setting, number of unigrams in query string)
arg sent in: q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 3 bigrams
debugQuery
<str name="rawquerystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_uni_search:旧
cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_uni_search:旧 cjk_uni_search:小
cjk_uni_search:说)~3))~0.01 ()</str>
If I use a field in qf that has both bigrams and unigrams, then qs is set to
MIN(original mm setting, number of bigrams + unigrams in query string).
arg sent in: q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5
debugQuery
<str name="rawquerystring">{!qf=cjk_both_pub_search pf= pf2=
pf3=}旧小说</str>
<str name="querystring">{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说</str>
<str name="parsedquery">(+DisjunctionMaxQuery((((cjk_both_search:旧
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
cjk_both_search:说)~5))~0.01) ())/no_coord</str>
<str name="parsedquery_toString">+(((cjk_both_search:旧
cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
cjk_both_search:说)~5))~0.01 ()</str>
I am running Solr 4.4. I have fields defined like so:
<fieldtype name="text_cjk_both" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true"
katakana="true" hangul="true" outputUnigrams="true" />
</analyzer>
</fieldtype>
<fieldtype name="text_cjk_bi" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
<filter class="solr.CJKBigramFilterFactory" han="true" hiragana="true"
katakana="true" hangul="true" outputUnigrams="false" />
</analyzer>
</fieldtype>
<fieldtype name="text_cjk_uni" class="solr.TextField"
positionIncrementGap="10000" autoGeneratePhraseQueries="false">
<analyzer>
<tokenizer class="solr.ICUTokenizerFactory" />
<filter class="solr.CJKWidthFilterFactory"/>
<filter class="solr.ICUTransformFilterFactory"
id="Traditional-Simplified"/>
<filter class="solr.ICUTransformFilterFactory" id="Katakana-Hiragana"/>
<filter class="solr.ICUFoldingFilterFactory"/>
</analyzer>
</fieldtype>
The request handler uses edismax:
<requestHandler name="search" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="q.alt">*:*</str>
<str name="mm">6<-1 6<90%</str>
<int name="qs">1</int>
<int name="ps">0</int>
> bad qs and mm when using edismax for field with CJKBigramFilter
> ----------------------------------------------------------------
>
> Key: SOLR-5212
> URL: https://issues.apache.org/jira/browse/SOLR-5212
> Project: Solr
> Issue Type: Bug
> Components: search
> Affects Versions: 4.4
> Reporter: Naomi Dushay
> Priority: Critical
>
> When I have a field using CJKBigramFilter, a mysterious qs value (or what i
> take as qs, because it shows as ~x after the first DisjunctionMaxQuery)
> appears in my parsed query. The qs value that appears is the minimum of:
> mm setting, number of bigrams in query string.
> This makes no sense, from a retrieval standpoint. It could possibly make
> sense to adjust the ps value, but certainly not the qs.
> If I use a field in qf that has only bigrams, then qs is set to MIN(original
> mm setting, number of bigrams in query string)
> arg sent in: q={!qf=cjk_bi_search pf= pf2= pf3=}旧小说
> 旧小说 is 3 chars, so 2 bigrams
> debugQuery
> <str name="rawquerystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
> <str name="querystring">{!qf=cjk_bi_search pf= pf2= pf3=}旧小说</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_bi_search:旧小
> cjk_bi_search:小说)~2))~0.01) ())/no_coord</str>
> <str name="parsedquery_toString">+(((cjk_bi_search:旧小
> cjk_bi_search:小说)~2))~0.01 ()</str>
> If I use a field in qf that has only unigrams, then qs is set to MIN(original
> mm setting, number of unigrams in query string)
> arg sent in: q={!qf=cjk_uni_search pf= pf2= pf3=}旧小说
> 旧小说 is 3 chars, so 3 bigrams
> debugQuery
> <str name="rawquerystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
> <str name="querystring">{!qf=cjk_uni_search pf= pf2= pf3=}旧小说</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_uni_search:旧
> cjk_uni_search:小 cjk_uni_search:说)~3))~0.01) ())/no_coord</str>
> <str name="parsedquery_toString">+(((cjk_uni_search:旧 cjk_uni_search:小
> cjk_uni_search:说)~3))~0.01 ()</str>
> If I use a field in qf that has both bigrams and unigrams, then qs is set to
> MIN(original mm setting, number of bigrams + unigrams in query string).
> arg sent in: q={!qf=cjk_both_search pf= pf2= pf3=}旧小说
> 旧小说 is 3 chars, so 3 unigrams + 2 bigrams = 5
> debugQuery
> <str name="rawquerystring">{!qf=cjk_both_pub_search pf= pf2=
> pf3=}旧小说</str>
> <str name="querystring">{!qf=cjk_both_pub_search pf= pf2= pf3=}旧小说</str>
> <str name="parsedquery">(+DisjunctionMaxQuery((((cjk_both_search:旧
> cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
> cjk_both_search:说)~5))~0.01) ())/no_coord</str>
> <str name="parsedquery_toString">+(((cjk_both_search:旧
> cjk_both_search:旧小 cjk_both_search:小 cjk_both_search:小说
> cjk_both_search:说)~5))~0.01 ()</str>
> I am running Solr 4.4. I have fields defined like so:
> <fieldtype name="text_cjk_both" class="solr.TextField"
> positionIncrementGap="10000" autoGeneratePhraseQueries="false">
> <analyzer>
> <tokenizer class="solr.ICUTokenizerFactory" />
> <filter class="solr.CJKWidthFilterFactory"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Traditional-Simplified"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Katakana-Hiragana"/>
> <filter class="solr.ICUFoldingFilterFactory"/>
> <filter class="solr.CJKBigramFilterFactory" han="true"
> hiragana="true" katakana="true" hangul="true" outputUnigrams="true" />
> </analyzer>
> </fieldtype>
> <fieldtype name="text_cjk_bi" class="solr.TextField"
> positionIncrementGap="10000" autoGeneratePhraseQueries="false">
> <analyzer>
> <tokenizer class="solr.ICUTokenizerFactory" />
> <filter class="solr.CJKWidthFilterFactory"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Traditional-Simplified"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Katakana-Hiragana"/>
> <filter class="solr.ICUFoldingFilterFactory"/>
> <filter class="solr.CJKBigramFilterFactory" han="true"
> hiragana="true" katakana="true" hangul="true" outputUnigrams="false" />
> </analyzer>
> </fieldtype>
> <fieldtype name="text_cjk_uni" class="solr.TextField"
> positionIncrementGap="10000" autoGeneratePhraseQueries="false">
> <analyzer>
> <tokenizer class="solr.ICUTokenizerFactory" />
> <filter class="solr.CJKWidthFilterFactory"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Traditional-Simplified"/>
> <filter class="solr.ICUTransformFilterFactory"
> id="Katakana-Hiragana"/>
> <filter class="solr.ICUFoldingFilterFactory"/>
> </analyzer>
> </fieldtype>
> The request handler uses edismax:
> <requestHandler name="search" class="solr.SearchHandler" default="true">
> <lst name="defaults">
> <str name="defType">edismax</str>
> <str name="q.alt">*:*</str>
> <str name="mm">6<-1 6<90%</str>
> <int name="qs">1</int>
> <int name="ps">0</int>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]