Shawn Heisey commented on SOLR-9764:

Something just mentioned for memory efficiency on the mailing list was a 
roaring bitmap.  Lucene does have this implemented already -- RoaringDocIdSet.  
I do not know how it would perform when actually used as a filterCache entry, 
compared to the current bitset implementation.

> Design a memory efficient DocSet if a query returns all docs
> ------------------------------------------------------------
>                 Key: SOLR-9764
>                 URL: https://issues.apache.org/jira/browse/SOLR-9764
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Michael Sun
>         Attachments: SOLR-9764.patch, SOLR-9764.patch, SOLR-9764.patch, 
> SOLR-9764.patch, SOLR-9764.patch, SOLR_9764_no_cloneMe.patch
> In some use cases, particularly use cases with time series data, using 
> collection alias and partitioning data into multiple small collections using 
> timestamp, a filter query can match all documents in a collection. Currently 
> BitDocSet is used which contains a large array of long integers with every 
> bits set to 1. After querying, the resulted DocSet saved in filter cache is 
> large and becomes one of the main memory consumers in these use cases.
> For example. suppose a Solr setup has 14 collections for data in last 14 
> days, each collection with one day of data. A filter query for last one week 
> data would result in at least six DocSet in filter cache which matches all 
> documents in six collections respectively.   
> This is to design a new DocSet that is memory efficient for such a use case.  
> The new DocSet removes the large array, reduces memory usage and GC pressure 
> without losing advantage of large filter cache.
> In particular, for use cases when using time series data, collection alias 
> and partition data into multiple small collections using timestamp, the gain 
> can be large.
> For further optimization, it may be helpful to design a DocSet with run 
> length encoding. Thanks [~mmokhtar] for suggestion. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to