I wonder if there might be value in BitDocIdSet.Builder which Lucene uses.
It had perf issues of its soon, but LUCENE-6645 seems to have fixed them,
and it does a similar approach as above (int array and then fixedbitset).
On 3 Aug 2015 12:35, "Toke Eskildsen" <t...@statsbiblioteket.dk> wrote:

> On Sat, 2015-08-01 at 15:09 -0700, Yonik Seeley wrote:
> > I also investigated going the other way and tracking a List<int[]> and
> > allocating in smaller chunks (and even having a memory pool to pull
> > the fixed size chunks from) but it was slower on my first attempt and
> > I haven't returned to try more variants yet.  It *feels* like we
> > should be able to get overall speedups by allocating in 8K chunks or
> > so when the effects of memory bandwidth (the cost of zeroing) and GC
> > are considered.
>
> Chunked allocations of int[] would still have the problem of having the
> copy-to-bitmap step if the result set gets too big.
>
> Chunks might work better with the garbage collector, compared to the
> current solution, but I greatly prefer the idea of re-using structures.
>
> That being said, I realize that it is not simple to choose the proper
> strategy:
>
> http://stackoverflow.com/questions/1955322/at-what-point-is-it-worth-reusing-arrays-in-java
>
> In the case of an update-tracked structure, the cost of zeroing is
> linear to the amount of changed values. This makes it even harder to
> determine the best strategy as it will be tied to concrete index size
> and query pattern.
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to