[
https://issues.apache.org/jira/browse/LUCENE-7258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286858#comment-15286858
]
David Smiley commented on LUCENE-7258:
--------------------------------------
nit: typo in comment line 96: "cumulated" -> "accumulated"
In ensureBufferCapacity, when buffers.isEmpty, I think the first buffer should
have a minimum size of 64 (or 32?), not 1. This will avoid a possible slow
start of small buffers when numDocs is 0 or 1. At least I saw this while
setting a breakpoint in some spatial tests, seeing the first two buffers both
of size one and the 3rd of size two, etc.
Also in this method...
{code:java}
if (current.length < current.array.length - (current.array.length >>> 2)) {
// current buffer is less than 7/8 full, resize rather than waste space
{code}
That calculation is not 7/8, it's 3/4.
In concat(), it'd be helpful to comment that it not only concatenates but
leaves one additional space too. Also... do you think it might be worth
optimizing for the case that there is one buffer that can simply be returned?
If when this happens it tends to be exactly full then maybe when we allocate
new buffers we can leave that one additional slot there so that this happens
more often.
For readability sake, can you re-order the methods grow, ensureBufferCapacity,
and addBuffer, growBuffer, upgradeToBitSet to be in this sequence (or
thereabouts) as that is the sequence of who calls who? I find it much easier
to read code top to bottom than bottom up :-) Likewise, build() could be
defined before the private utility methods it calls.
> Tune DocIdSetBuilder allocation rate
> ------------------------------------
>
> Key: LUCENE-7258
> URL: https://issues.apache.org/jira/browse/LUCENE-7258
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/spatial
> Reporter: Jeff Wartes
> Attachments:
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch,
> LUCENE-7258-Tune-memory-allocation-rate-for-Intersec.patch,
> LUCENE-7258-expanding.patch, allocation_plot.jpg
>
>
> LUCENE-7211 converted IntersectsPrefixTreeQuery to use DocIdSetBuilder, but
> didn't actually reduce garbage generation for my Solr index.
> Since something like 40% of my garbage (by space) is now attributed to
> DocIdSetBuilder.growBuffer, I charted a few different allocation strategies
> to see if I could tune things more.
> See here: http://i.imgur.com/7sXLAYv.jpg
> The jump-then-flatline at the right would be where DocIdSetBuilder gives up
> and allocates a FixedBitSet for a 100M-doc index. (The 1M-doc index
> curve/cutoff looked similar)
> Perhaps unsurprisingly, the 1/8th growth factor in ArrayUtil.oversize is
> terrible from an allocation standpoint if you're doing a lot of expansions,
> and is especially terrible when used to build a short-lived data structure
> like this one.
> By the time it goes with the FBS, it's allocated around twice as much memory
> for the buffer as it would have needed for just the FBS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]