[
https://issues.apache.org/jira/browse/LUCENE-7897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16080551#comment-16080551
]
Adrien Grand commented on LUCENE-7897:
--------------------------------------
bq. May be we could refactor this if we can pass the "#matchingdocs or
minScore" to the place where we decide the scorer.
Would you like to give it a try?
> RangeQuery optimization in IndexOrDocValuesQuery
> -------------------------------------------------
>
> Key: LUCENE-7897
> URL: https://issues.apache.org/jira/browse/LUCENE-7897
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Affects Versions: trunk, 7.0
> Reporter: Murali Krishna P
>
> For range queries, Lucene uses either Points or Docvalues based on cost
> estimation
> (https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/search/IndexOrDocValuesQuery.html).
> Scorer is chosen based on the minCost here:
> https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/Boolean2ScorerSupplier.java#L16
> However, the cost calculation for TermQuery and IndexOrDocvalueQuery seems to
> have same weightage. Essentially, cost depends upon the docfreq in TermDict,
> number of points visited and number of docvalues. In a situation where
> docfreq is not too restrictive, this is lot of lookups for docvalues and
> using points would have been better.
> Following query with 1M matches, takes 60ms with docvalues, but only 27ms
> with points. If I change the query to "message:*", which matches all docs, it
> choses the points(since cost is same), but with message:xyz it choses
> docvalues eventhough doc frequency is 1million which results in many docvalue
> fetches. Would it make sense to change the cost of docvalues query to be
> higher or use points if the docfreq is too high for the term query(find an
> optimum threshold where points cost < docvalue cost)?
> {noformat}
> {
> "query": {
> "bool": {
> "must": [
> {
> "query_string": {
> "query": "message:xyz"
> }
> },
> {
> "range": {
> "@timestamp": {
> "gte": 1498652400000,
> "lte": 1498905000000,
> "format": "epoch_millis"
> }
> }
> }
> ]
> }
> }
> }
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]