[ 
https://issues.apache.org/jira/browse/LUCENE-5101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13707889#comment-13707889
 ] 

Robert Muir commented on LUCENE-5101:
-------------------------------------

{quote}
Maybe we could use these numbers to have better defaults in CWF? (and only use 
FixedBitSet for dense sets for example)
{quote}

+1: we should have better defaults. Ideally we would use DISI.cost() to 
estimate the sparsity.

One problem is a lot of the costly filters that people want to cache have a 
crap cost() implementation.
e.g. MultiTermQueryWrapperFilter could instead getAndSet() and return a DISI 
with an actual accurate cost().

Or instead for now, we could also check firstDocID too...

                
> make it easier to plugin different bitset implementations to 
> CachingWrapperFilter
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-5101
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5101
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Robert Muir
>
> Currently this is possible, but its not so friendly:
> {code}
>   protected DocIdSet docIdSetToCache(DocIdSet docIdSet, AtomicReader reader) 
> throws IOException {
>     if (docIdSet == null) {
>       // this is better than returning null, as the nonnull result can be 
> cached
>       return EMPTY_DOCIDSET;
>     } else if (docIdSet.isCacheable()) {
>       return docIdSet;
>     } else {
>       final DocIdSetIterator it = docIdSet.iterator();
>       // null is allowed to be returned by iterator(),
>       // in this case we wrap with the sentinel set,
>       // which is cacheable.
>       if (it == null) {
>         return EMPTY_DOCIDSET;
>       } else {
> /* INTERESTING PART */
>         final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
>         bits.or(it);
>         return bits;
> /* END INTERESTING PART */
>       }
>     }
>   }
> {code}
> Is there any value to having all this other logic in the protected API? It 
> seems like something thats not useful for a subclass... Maybe this stuff can 
> become final, and "INTERESTING PART" calls a simpler method, something like:
> {code}
> protected DocIdSet cacheImpl(DocIdSetIterator iterator, AtomicReader reader) {
>   final FixedBitSet bits = new FixedBitSet(reader.maxDoc());
>   bits.or(iterator);
>   return bits;
> }
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to