[jira] [Commented] (LUCENE-10120) Lazy initialize FixedBitSet in LRUQueryCache

Adrien Grand (Jira) Tue, 09 Nov 2021 01:28:06 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441011#comment-17441011
 ]


Adrien Grand commented on LUCENE-10120:
---------------------------------------

bq. I suppose there could still be situations where a more complex query ends 
up matching all docs in the index and gets cached, but maybe it's pretty 
unlikely (e.g., disjunction of terms that results in all docs matching).

Right, this is the sort of things I was thinking of. Maybe we could improve 
these cases via rewrite rules instead? For instance we could implement 
{{PointRangeQuery#rewrite}} to rewrite to a {{MatchAllDocsQuery}} when the 
query range fully contains the index and {{docCount == maxDoc}}? (And a 
{{DocValuesFieldExistsQuery}} when the query fully contains index values and 
the field has doc values enabled.)

And likewise for the case around pure disjunctions, if the field has 
{{sumDocFreq == docCount && docCount == maxDoc}} (meaning it's single-valued 
and dense) and the sum of the doc freqs of the terms is equal to maxDoc, then 
we could rewrite the disjunction to a MatchAllDocsQuery too?

Then there wouldn't be many cases when this PR would actually kick in?

bq. when our program want to use PointRangeQuery only to collect numbers of docs

If the use-case you are interested in is counting matches for range queries on 
1D fields, there is an open issue around changing the Points API and then 
implementing Weight#count on PointRangeQuery so that we could count matches for 
single-valued fields without running the query. See LUCENE-9619 and 
LUCENE-9820. Once we have this, we will be able to count matches of many range 
queries on 1D-points without having to collect matches, we will be able to get 
the count by just looking at the index of the BKD tree and counting matches on 
the two leaf nodes that partially match the query.

> Lazy initialize FixedBitSet in LRUQueryCache
> --------------------------------------------
>
>                 Key: LUCENE-10120
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10120
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: main (10.0)
>            Reporter: Lu Xugang
>            Priority: Major
>         Attachments: 1.png, LUCENE-10120.patch
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Basing on the implement of collecting docIds in DocsWithFieldSet, may be we 
> could do similar way to cache docIdSet in 
> *LRUQueryCache#cacheIntoBitSet(BulkScorer scorer, int maxDoc)* when docIdSet 
> is density.
> In this way , we do not always init a huge FixedBitSet which sometime is not 
> necessary when maxDoc is large
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10120) Lazy initialize FixedBitSet in LRUQueryCache

Reply via email to