The *_str variant produced by the _default configset is DocValues only, as thus 
intended primarily for faceting and sorting.
Try changing this line in your schema

<dynamicField name="*_str" type="strings" docValues="true" indexed="false" 
stored="false" useDocValuesAsStored="false»/>

to

<dynamicField name="*_str" type="strings" docValues="true" indexed="true" 
stored="false" useDocValuesAsStored="false»/>

…and it will both work and be more performant.

But also file a JIRA since it is obviously a bug - matching a string from 
DocValues should still work even if slow.

Jan

> 2. mar. 2020 kl. 15:35 skrev Erick Erickson <erickerick...@gmail.com>:
> 
> Hongtai Xue:
> 
> First, many thanks for reporting this in such detail, it really helps and 
> it’s obvious you’ve dug into the problem rather than just thrown it over the 
> wall.
> 
> Please do raise a JIRA, no matter what the behaviors should be the same.
> 
> One caution: Searching on a docValues=“true” indexed=“false” will not be 
> performant as the index grows last I knew (think “table scan”). DocValues is 
> specifically designed to answer the question “for doc y, what is the value if 
> field x” and this form is asking “for value x, what docs contain it”. At 
> least check with a reasonably large data set before allowing that in your 
> app. Personally, I’d like to see the ability to search on a dv-only field 
> restricted, but that’s another story...
> 
> That is not to say the behavior you’re reporting is OK, it’s not. Just a 
> caution for you going forward.
> 
> Best,
> Erick
> 
>> On Mar 2, 2020, at 03:45, Hongtai Xue <h...@yahoo-corp.jp> wrote:
>> 
>> 
>> Hi,
>>  
>> Our team found a strange behavior of solr query parser.
>> In some specific cases, some conditional clauses on unindexed field will be 
>> ignored.
>>  
>> for query like, q=A:1 OR B:1 OR A:2 OR B:2
>> if field B is not indexed(but docValues="true"), "B:1" will be lost.
>>  
>> but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
>> it will work perfect.
>>  
>> the only difference of two queries is that they are wrote in different 
>> orders.
>> one is ABAB, another is AABB,
>>  
>> ■reproduce steps and example explanation
>> you can easily reproduce this problem on a solr collection with _default 
>> configset and exampledocs/books.csv data.
>>  
>> 1. create a _default collection
>> bin/solr create -c books -s 2 -rf 2
>>  
>> 2. post books.csv.
>> bin/post -c books example/exampledocs/books.csv
>>  
>> 3. run following query.
>> http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29&debug=query
>>  
>>  
>> I printed query parsing debug information.
>> you can tell "name_str:Foundation" is lost.
>>  
>> query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"
>> (please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 
>> 6e 64 61 74 69 6f 6e")
>> --------
>>   "debug":{
>>     "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg 
>> OR cat:cd)",
>>     "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
>> cat:cd)",
>>     "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 
>> 68 65 72 65 67]]))",
>>     "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] 
>> TO [4a 68 65 72 65 67]])",
>>     "QParser":"LuceneQParser"}}
>> --------
>>  
>> but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR 
>> cat:cd",
>> everything is OK. "name_str:Foundation" is not lost.
>> --------
>>   "debug":{
>>     "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book 
>> OR cat:cd)",
>>     "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
>> cat:cd)",
>>     "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
>> 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO 
>> [4a 68 65 72 65 67]])))",
>>     "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 
>> 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 
>> 67] TO [4a 68 65 72 65 67]]))",
>>     "QParser":"LuceneQParser"}}
>> --------
>> http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29&debug=query
>>  
>> we did a little bit research, and we wander if it is a bug of 
>> SolrQueryParser.
>> more specifically, we think if statement here might be wrong.
>> https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711
>>  
>> Could you please tell us if it is a bug, or it's just a wrong query 
>> statement.
>>  
>> Thanks,
>> Hongtai Xue

Reply via email to