[
https://issues.apache.org/jira/browse/LUCENE-7246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15312425#comment-15312425
]
David Smiley commented on LUCENE-7246:
--------------------------------------
Another possibility is having the LRUQueryCache not actually cache on the first
hit, requiring a 2nd hit? That obviously has its trade-offs too. I guess I
kind of like this patch better than doing that, even if it adds a new API
method on Weight. But it does intertwine two things -- returning a DocIdSet,
and wether or not this DocIdSet should be cached. In your patch LRUCache
assumes whatever DocIdSet this returns is suitable to be cached. Maybe it is
sometimes but not other times? We could just override this method for the ones
where it is cacheable but, again, we're then intertwining concerns. Maybe
DocIdSet should have an isCacheable(), so that if it isn't _then_ we wrap in a
RoardingDocIdSet. Or if we really don't want that method, then have this new
method on Weight be named something like cacheableDocIdSet. Then it's clear
that the method should only be overridden when the Query/Weight already has one
(e.g. a bit set).
> Can LRUQueryCache reuse DocIdSets that are created by some queries anyway?
> --------------------------------------------------------------------------
>
> Key: LUCENE-7246
> URL: https://issues.apache.org/jira/browse/LUCENE-7246
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7246.patch, LUCENE-7246.patch
>
>
> Some queries need to create a DocIdSet to work. This is for instance the case
> with TermsQuery, multi-term queries, point-in-set queries and point range
> queries. We cache them more aggressively because these queries need to
> evaluate all matches on a segment before they can return a Scorer. But this
> can also be dangerous: if there is little reuse, then we keep converting the
> doc id sets that these queries create to another DocIdSet.
> This worries me a bit eg. for point range queries: they made numeric ranges
> faster in practice so I would not like caching to make them appear slower
> than they are when caching is disabled.
> So I would like to somehow bring back the optimization that we had in 1.x
> with DocIdSet.isCacheable so that we do not need to convert DocIdSet
> instances when we could just reuse existing instances.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]