[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5425:
-------------------------------

    Attachment: LUCENE-5425.patch

* I implemented FixedBitSet.iterator() to return what I think is a more 
optimized version, and not OpenBitSetIterator. I also saved nextSetBit two 
additions in nextSetBit (and the iterator). I think we may want to commit those 
two changes irrespective of whether we cut over to a general DocIdSet in 
MatchingDocs.

I then reviewed the patch more carefully and I noticed a couple of issues, all 
fixed in this patch:

* FacetsCollector.createHitSet() returned a MutableDocIdSet which uses 
OpenBitSet and not FixedBitSet internally. This affects .add() too, especially 
as it used set() and not fastSet(). So I modified it to use FixedBitSet, as the 
number of bits is known in advance.

* I moved MutableDocIdSet inside FacetsCollector and changed it to not extend 
DocIdSet, but rather expose two methods: add() and getDocs(). The latter 
returns a DocIdSet.
** That way, MatchingDocs doesn't need to declare MutableDocIdSet but DocIdSet, 
which makes more sense as we don't want the "users" of MatchingDocs to be able 
to modify the doc id set.

* I noticed many places in the code still included a {{++doc}} even though it's 
not needed anymore, so I removed them.

I wonder if the 10% loss that Mike saw was related to both the usage of 
OpenBitSet (which affects collection) and OpenBitSetIterator (which affects 
accumulation), and how will this patch perform vs trunk. I will try to setup my 
environment to run the facets benchmark, but Mike, if you can repeat the test 
w/ this patch and post the results, that would be great.

> Make creation of FixedBitSet in FacetsCollector overridable
> -----------------------------------------------------------
>
>                 Key: LUCENE-5425
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5425
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 4.6
>            Reporter: John Wang
>         Attachments: LUCENE-5425.patch, facetscollector.patch, 
> facetscollector.patch, fixbitset.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to