[
https://issues.apache.org/jira/browse/LUCENE-1296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12601504#action_12601504
]
Paul Elschot commented on LUCENE-1296:
--------------------------------------
I tried to come up with a sensible performance test to determine a good
criterium to choose between OpenBitSet and SortedVIntList as the DocIdSet
supporting data structure to be cached.
There is a criterium for this in the patch in docIdSetToCache() method of
CachingWrapperFilter, but it's only based on byte size, and it favours
SortedVIntList when it is defenitely more compact than OpenBitSet.
The current criterium is to use (cardinality (=nr bits set in OpenBitSet) <
maxDocs/9) as a test to prefer SortedVIntList over OpenBitSet for caching. The
constant 9 might be replaced by a configuration parameter to allow easy
performance experiments there. It could be that a larger value than 9 is turns
out to be "optimal" in runtime.
In some cases OpenBitSet can be faster on skipTo(int docNum) than
SortedVIntList, even when SortedVIntList is more compact. As Filters can be
expected to use skipTo() heavily, this could be important for performance.
Even even though it might be possible to measure the skipTo() performance
directly, the effect of the more compact cached data structure of
SortedVIntList on garbage collection is (pretty close to) impossible to measure
in a simple test case.
Eks Dev had some interesting results there in the very early stages of
LUCENE-584 (September 2006), so I wonder whether these results could be
confirmed somehow using the patch here and the current trunk.
Comments?
> Allow use of compact DocIdSet in CachingWrapperFilter
> -----------------------------------------------------
>
> Key: LUCENE-1296
> URL: https://issues.apache.org/jira/browse/LUCENE-1296
> Project: Lucene - Java
> Issue Type: New Feature
> Components: Search
> Reporter: Paul Elschot
> Assignee: Michael Busch
> Priority: Minor
> Attachments: cachedFilter20080529.patch
>
>
> Extends CachingWrapperFilter with a protected method to determine the
> DocIdSet to be cached.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]