[jira] [Commented] (LUCENE-7246) Can LRUQueryCache reuse DocIdSets that are created by some queries anyway?

Adrien Grand (JIRA) Thu, 02 Jun 2016 08:24:46 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312469#comment-15312469
 ]


Adrien Grand commented on LUCENE-7246:
--------------------------------------

bq. Another possibility is having the LRUQueryCache not actually cache on the 
first hit, requiring a 2nd hit?

The cache already requires a second hit for point queries. I guess the 
frustration is about the fact that since these queries need to execute on the 
whole index, we want to cache them quite early, but then we have the issue that 
the action of creating a cache entry is not cheap, and can make things 
unnecessarily slower in the case that most ranges are used eg. eg. 2 or 3 
times. I don't think it is a big deal if we still do this copy for point 
queries, but I was curious to open this issue for discussion to see if we could 
find a clean way.

bq. In your patch LRUCache assumes whatever DocIdSet this returns is suitable 
to be cached.

Agreed it is not obvious. Now that Filter is gone, maybe we should update the 
DocIdSet javadocs in order to be explicit about the fact that DocIdSets need to 
be cacheable (there is no reason anymore to write non-cacheable doc id sets?). 
And maybe also add cacheable to the method name as you suggest.


> Can LRUQueryCache reuse DocIdSets that are created by some queries anyway?
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-7246
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7246
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7246.patch, LUCENE-7246.patch
>
>
> Some queries need to create a DocIdSet to work. This is for instance the case 
> with TermsQuery, multi-term queries, point-in-set queries and point range 
> queries. We cache them more aggressively because these queries need to 
> evaluate all matches on a segment before they can return a Scorer. But this 
> can also be dangerous: if there is little reuse, then we keep converting the 
> doc id sets that these queries create to another DocIdSet.
> This worries me a bit eg. for point range queries: they made numeric ranges 
> faster in practice so I would not like caching to make them appear slower 
> than they are when caching is disabled.
> So I would like to somehow bring back the optimization that we had in 1.x 
> with DocIdSet.isCacheable so that we do not need to convert DocIdSet 
> instances when we could just reuse existing instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-7246) Can LRUQueryCache reuse DocIdSets that are created by some queries anyway?

Reply via email to