I wonder if there might be value in BitDocIdSet.Builder which Lucene uses. It had perf issues of its soon, but LUCENE-6645 seems to have fixed them, and it does a similar approach as above (int array and then fixedbitset). On 3 Aug 2015 12:35, "Toke Eskildsen" <t...@statsbiblioteket.dk> wrote:
> On Sat, 2015-08-01 at 15:09 -0700, Yonik Seeley wrote: > > I also investigated going the other way and tracking a List<int[]> and > > allocating in smaller chunks (and even having a memory pool to pull > > the fixed size chunks from) but it was slower on my first attempt and > > I haven't returned to try more variants yet. It *feels* like we > > should be able to get overall speedups by allocating in 8K chunks or > > so when the effects of memory bandwidth (the cost of zeroing) and GC > > are considered. > > Chunked allocations of int[] would still have the problem of having the > copy-to-bitmap step if the result set gets too big. > > Chunks might work better with the garbage collector, compared to the > current solution, but I greatly prefer the idea of re-using structures. > > That being said, I realize that it is not simple to choose the proper > strategy: > > http://stackoverflow.com/questions/1955322/at-what-point-is-it-worth-reusing-arrays-in-java > > In the case of an update-tracked structure, the cost of zeroing is > linear to the amount of changed values. This makes it even harder to > determine the best strategy as it will be tied to concrete index size > and query pattern. > > - Toke Eskildsen, State and University Library, Denmark > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >