On Sat, Aug 1, 2015 at 2:44 PM, Toke Eskildsen <[email protected]> wrote: > Before I open a JIRA, I would like to run a sanity check here. > > My understanding: > > The DocSetCollector is used e.g. when the ResponseBuilder.isNeedDocSet is > true. This is the case for e.g. field faceting. The DocSetCollector is > optimistic and works from the assumption that the result set will be less > than 1/64th of maxDoc. It does this by allocating an int[maxDoc/64], which > takes up maxDoc/16 bytes. docIDs are collected in this array and when the > DocSet is to be returned, the array is wrapped in a SortedIntDocSet, which > reduces it to int[hits]. > > If the result set exceeds maxDoc/64, a FixedBitSet, which takes up maxDoc/8 > bytes, is created and all the values are copied from the int-array. The > int-array is not freed, so temporary overhead during collection is now > maxDoc/16 + maxDoc/8 bytes. Then the collection has finished, a BitDocSet > (maxDoc/8 bytes) is created from the FixedBitSet. > > The JavaDoc indicates that the reason for this solution is to avoid very > sparse bitmaps from small result sets with a FixedBitSet-only strategy. As > the memory overhead for the current solution varies from 1/2 to 3/2 of the > pure FixedBitSet-only solution, I assume that the problem with sparse bitmaps > is not memory but processing time?
You make a good point on the relative sizes of the sets. We could free the int[] at the same time as the FixedBitSet is allocated. I also investigated going the other way and tracking a List<int[]> and allocating in smaller chunks (and even having a memory pool to pull the fixed size chunks from) but it was slower on my first attempt and I haven't returned to try more variants yet. It *feels* like we should be able to get overall speedups by allocating in 8K chunks or so when the effects of memory bandwidth (the cost of zeroing) and GC are considered. -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
