Hongtai Xue:

First, many thanks for reporting this in such detail, it really helps and it’s 
obvious you’ve dug into the problem rather than just thrown it over the wall.

Please do raise a JIRA, no matter what the behaviors should be the same.

One caution: Searching on a docValues=“true” indexed=“false” will not be 
performant as the index grows last I knew (think “table scan”). DocValues is 
specifically designed to answer the question “for doc y, what is the value if 
field x” and this form is asking “for value x, what docs contain it”. At least 
check with a reasonably large data set before allowing that in your app. 
Personally, I’d like to see the ability to search on a dv-only field 
restricted, but that’s another story...

That is not to say the behavior you’re reporting is OK, it’s not. Just a 
caution for you going forward.

Best,
Erick

> On Mar 2, 2020, at 03:45, Hongtai Xue <h...@yahoo-corp.jp> wrote:
> 
> 
> Hi,
>  
> Our team found a strange behavior of solr query parser.
> In some specific cases, some conditional clauses on unindexed field will be 
> ignored.
>  
> for query like, q=A:1 OR B:1 OR A:2 OR B:2
> if field B is not indexed(but docValues="true"), "B:1" will be lost.
>  
> but if you write query like, q=A:1 OR A:2 OR B:1 OR B:2,
> it will work perfect.
>  
> the only difference of two queries is that they are wrote in different orders.
> one is ABAB, another is AABB,
>  
> ■reproduce steps and example explanation
> you can easily reproduce this problem on a solr collection with _default 
> configset and exampledocs/books.csv data.
>  
> 1. create a _default collection
> bin/solr create -c books -s 2 -rf 2
>  
> 2. post books.csv.
> bin/post -c books example/exampledocs/books.csv
>  
> 3. run following query.
> http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+cat%3Abook+OR+name_str%3AJhereg+OR+cat%3Acd%29&debug=query
>  
>  
> I printed query parsing debug information.
> you can tell "name_str:Foundation" is lost.
>  
> query: "name_str:Foundation OR cat:book OR name_str:Jhereg OR cat:cd"
> (please note "Jhereg" is "4a 68 65 72 65 67" and "Foundation" is "46 6f 75 6e 
> 64 61 74 69 6f 6e")
> --------
>   "debug":{
>     "rawquerystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
> cat:cd)",
>     "querystring":"+(name_str:Foundation OR cat:book OR name_str:Jhereg OR 
> cat:cd)",
>     "parsedquery":"+(cat:book cat:cd (name_str:[[4a 68 65 72 65 67] TO [4a 68 
> 65 72 65 67]]))",
>     "parsedquery_toString":"+(cat:book cat:cd name_str:[[4a 68 65 72 65 67] 
> TO [4a 68 65 72 65 67]])",
>     "QParser":"LuceneQParser"}}
> --------
>  
> but for query: "name_str:Foundation OR name_str:Jhereg OR cat:book OR cat:cd",
> everything is OK. "name_str:Foundation" is not lost.
> --------
>   "debug":{
>     "rawquerystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
> cat:cd)",
>     "querystring":"+(name_str:Foundation OR name_str:Jhereg OR cat:book OR 
> cat:cd)",
>     "parsedquery":"+(cat:book cat:cd ((name_str:[[46 6f 75 6e 64 61 74 69 6f 
> 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]]) (name_str:[[4a 68 65 72 65 67] TO 
> [4a 68 65 72 65 67]])))",
>     "parsedquery_toString":"+(cat:book cat:cd (name_str:[[46 6f 75 6e 64 61 
> 74 69 6f 6e] TO [46 6f 75 6e 64 61 74 69 6f 6e]] name_str:[[4a 68 65 72 65 
> 67] TO [4a 68 65 72 65 67]]))",
>     "QParser":"LuceneQParser"}}
> --------
> http://localhost:8983/solr/books/select?q=%2B%28name_str%3AFoundation+OR+name_str%3AJhereg+OR+cat%3Abook+OR+cat%3Acd%29&debug=query
>  
> we did a little bit research, and we wander if it is a bug of SolrQueryParser.
> more specifically, we think if statement here might be wrong.
> https://github.com/apache/lucene-solr/blob/branch_8_4/solr/core/src/java/org/apache/solr/parser/SolrQueryParserBase.java#L711
>  
> Could you please tell us if it is a bug, or it's just a wrong query statement.
>  
> Thanks,
> Hongtai Xue

Reply via email to