[
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-5425:
-------------------------------
Attachment: LUCENE-5425.patch
* I implemented FixedBitSet.iterator() to return what I think is a more
optimized version, and not OpenBitSetIterator. I also saved nextSetBit two
additions in nextSetBit (and the iterator). I think we may want to commit those
two changes irrespective of whether we cut over to a general DocIdSet in
MatchingDocs.
I then reviewed the patch more carefully and I noticed a couple of issues, all
fixed in this patch:
* FacetsCollector.createHitSet() returned a MutableDocIdSet which uses
OpenBitSet and not FixedBitSet internally. This affects .add() too, especially
as it used set() and not fastSet(). So I modified it to use FixedBitSet, as the
number of bits is known in advance.
* I moved MutableDocIdSet inside FacetsCollector and changed it to not extend
DocIdSet, but rather expose two methods: add() and getDocs(). The latter
returns a DocIdSet.
** That way, MatchingDocs doesn't need to declare MutableDocIdSet but DocIdSet,
which makes more sense as we don't want the "users" of MatchingDocs to be able
to modify the doc id set.
* I noticed many places in the code still included a {{++doc}} even though it's
not needed anymore, so I removed them.
I wonder if the 10% loss that Mike saw was related to both the usage of
OpenBitSet (which affects collection) and OpenBitSetIterator (which affects
accumulation), and how will this patch perform vs trunk. I will try to setup my
environment to run the facets benchmark, but Mike, if you can repeat the test
w/ this patch and post the results, that would be great.
> Make creation of FixedBitSet in FacetsCollector overridable
> -----------------------------------------------------------
>
> Key: LUCENE-5425
> URL: https://issues.apache.org/jira/browse/LUCENE-5425
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Affects Versions: 4.6
> Reporter: John Wang
> Attachments: LUCENE-5425.patch, facetscollector.patch,
> facetscollector.patch, fixbitset.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query.
> For large indexes where maxDocs are large creating a bitset of maxDoc bits
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining
> current behavior.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]