[jira] [Comment Edited] (LUCENE-8213) offload caching to a dedicated threadpool

Amir Hadadi (JIRA) Mon, 19 Mar 2018 05:47:24 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16404750#comment-16404750
 ]


Amir Hadadi edited comment on LUCENE-8213 at 3/19/18 12:46 PM:
---------------------------------------------------------------

[~rcmuir] the issue is that the execution path for (q1 AND q2) depends on 
whether q2 gets cached or not.
 When q2 does not get cached, doc values are used to execute q2 and only the 
single document matching q1 is evaluated against the range.
 When q2 gets cached, it gets cached as a query that stands by itself, i.e. not 
in the context of (q1 AND q2).
 So the entire 10M documents that q2 matches are scanned in the BKD tree and 
get cached to a bit set.
 To protect against the caching of q2 causing the latency of (q1 AND q2) to be 
too high, [~jpountz] added maxCostFactor.
 This factor checks whether the cost of caching q2 is higher by more than 
maxCostFactor than the cost of evaluating (q1 AND q2).
 This is the relevant code from LRUQueryCache:

 
{code:java}
double costFactor = (double) inSupplier.cost() / leadCost;
if (costFactor >= maxCostFactor) {
  // too costly, caching might make the query much slower
  return inSupplier.get(leadCost);
}{code}
 

My suggestion is to always evaluate (q1 AND q2) using the optimal execution 
path, and cache q2 asynchrounously.
 A refinement is to cache q2 synchronously if the cost of caching it is not too 
high.


was (Author: hermes):
[~rcmuir] the issue is that the execution path for (q1 AND q2) depends on 
whether q2 gets cached or not.
 When q2 does not get cached, doc values are used to execute q2 and only the 
single document matching q1 is evaluated against the range.
 When q2 gets cached, it gets cached as a query that stands by itself, i.e. not 
in the context of (q1 AND q2).
 So the entire 10M documents that q2 matches are scanned in the BKD tree and 
get cached to a bit set.
 To protect against the caching of q2 causing the latency of (q1 AND q2) to be 
too high, [~jpountz] added maxCostFactor.
 This factor checks whether the cost of caching q2 is higher by more than 
maxCostFactor than the cost of evaluating (q1 AND q2).
 This is the relevant code from LRUQueryCache:

 
{code:java}
double costFactor = (double) inSupplier.cost() / leadCost;
if (costFactor >= maxCostFactor) {
  // too costly, caching might make the query much slower
  return inSupplier.get(leadCost);
}{code}
 

My suggestion is to always evaluate (q1 AND q2) using the optimal path, and 
cache q2 asynchrounously.
 A refinement is to cache q2 synchronously if the cost of caching it is not too 
high.

> offload caching to a dedicated threadpool
> -----------------------------------------
>
>                 Key: LUCENE-8213
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8213
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>    Affects Versions: 7.2.1
>            Reporter: Amir Hadadi
>            Priority: Minor
>              Labels: performance
>
> IndexOrDocValuesQuery allows to combine non selective range queries with a 
> selective lead iterator in an optimized way. However, the range query at some 
> point gets cached by a querying thread in LRUQueryCache, which negates the 
> optimization of IndexOrDocValuesQuery for that specific query.
> It would be nice to see a caching implementation that offloads to a different 
> thread pool, so that queries involving IndexOrDocValuesQuery would have 
> consistent performance characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-8213) offload caching to a dedicated threadpool

Reply via email to