On Sat, Aug 1, 2015 at 2:44 PM, Toke Eskildsen <[email protected]> wrote:
> Before I open a JIRA, I would like to run a sanity check here.
>
> My understanding:
>
> The DocSetCollector is used e.g. when the ResponseBuilder.isNeedDocSet is 
> true. This is the case for e.g. field faceting. The DocSetCollector is 
> optimistic and works from the assumption that the result set will be less 
> than 1/64th of maxDoc. It does this by allocating an int[maxDoc/64], which 
> takes up maxDoc/16 bytes. docIDs are collected in this array and when the 
> DocSet is to be returned, the array is wrapped in a SortedIntDocSet, which 
> reduces it to int[hits].
>
> If the result set exceeds maxDoc/64, a FixedBitSet, which takes up maxDoc/8 
> bytes, is created and all the values are copied from the int-array. The 
> int-array is not freed, so temporary overhead during collection is now 
> maxDoc/16 + maxDoc/8 bytes. Then the collection has finished, a BitDocSet 
> (maxDoc/8 bytes) is created from the FixedBitSet.
>
> The JavaDoc indicates that the reason for this solution is to avoid very 
> sparse bitmaps from small result sets with a FixedBitSet-only strategy. As 
> the memory overhead for the current solution varies from 1/2 to 3/2 of the 
> pure FixedBitSet-only solution, I assume that the problem with sparse bitmaps 
> is not memory but processing time?

You make a good point on the relative sizes of the sets.
We could free the int[] at the same time as the FixedBitSet is allocated.

I also investigated going the other way and tracking a List<int[]> and
allocating in smaller chunks (and even having a memory pool to pull
the fixed size chunks from) but it was slower on my first attempt and
I haven't returned to try more variants yet.  It *feels* like we
should be able to get overall speedups by allocating in 8K chunks or
so when the effects of memory bandwidth (the cost of zeroing) and GC
are considered.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to