Murali Krishna P created LUCENE-7897:
----------------------------------------

             Summary: RangeQuery optimization in IndexOrDocValuesQuery 
                 Key: LUCENE-7897
                 URL: https://issues.apache.org/jira/browse/LUCENE-7897
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/search
    Affects Versions: trunk, 7.0
            Reporter: Murali Krishna P


For range queries, Lucene uses either Points or Docvalues based on cost 
estimation 
(https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/search/IndexOrDocValuesQuery.html).
 Scorer is chosen based on the minCost here: 
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java#L16

However, the cost calculation for TermQuery and IndexOrDocvalueQuery seems to 
have same weightage. Essentially, cost depends upon the docfreq in TermDict, 
number of points visited and number of docvalues. In a situation where docfreq 
is not too restrictive, this is lot of lookups for docvalues and using points 
would have been better.

Following query with 1M matches, takes 60ms with docvalues, but only 27ms with 
points. If I change the query to "message:*", which matches all docs, it choses 
the points(since cost is same), but with message:xyz it choses docvalues 
eventhough doc frequency is 1million which results in many docvalue fetches. 
Would it make sense to change the cost of docvalues query to be higher or use 
points if the docfreq is too high for the term query(find an optimum threshold 
where points cost < docvalue cost)?

{noformat}
{
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "message:xyz"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": 1498652400000,
              "lte": 1498905000000,
              "format": "epoch_millis"
            }
          }
        }
      ]
    }
  }
}
{noformat}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to