[ 
https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601504#action_12601504
 ] 

Paul Elschot commented on LUCENE-1296:
--------------------------------------

I tried to come up with a sensible performance test to determine a good 
criterium to choose between OpenBitSet and SortedVIntList as the DocIdSet 
supporting data structure to be cached.
There is a criterium for this in the patch in docIdSetToCache() method of 
CachingWrapperFilter, but it's only based on byte size, and it favours 
SortedVIntList when it is defenitely more compact than OpenBitSet.

The current criterium is to use (cardinality (=nr bits set in OpenBitSet) < 
maxDocs/9) as a test to prefer SortedVIntList over OpenBitSet for caching. The 
constant 9 might be replaced by a configuration parameter to allow easy 
performance experiments there. It could be that a larger value than 9 is  turns 
out to be "optimal" in runtime.

In some cases OpenBitSet can be faster on skipTo(int docNum) than 
SortedVIntList, even when SortedVIntList is more compact. As Filters can be 
expected to use skipTo() heavily, this could be important for performance.

Even even though it might be possible to measure the skipTo() performance 
directly, the effect of the more compact cached data structure of 
SortedVIntList on garbage collection is (pretty close to) impossible to measure 
in a simple test case.

Eks Dev had some interesting results there in the very early stages of 
LUCENE-584 (September 2006), so I wonder whether these results could be 
confirmed somehow using the patch here and the current trunk.

Comments?




> Allow use of compact DocIdSet in CachingWrapperFilter
> -----------------------------------------------------
>
>                 Key: LUCENE-1296
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1296
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Paul Elschot
>            Assignee: Michael Busch
>            Priority: Minor
>         Attachments: cachedFilter20080529.patch
>
>
> Extends CachingWrapperFilter with a protected method to determine the 
> DocIdSet to be cached.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to