Re: Dynamic array reallocation algorithms

Michael McCandless Wed, 13 Jan 2010 02:48:41 -0800

Phew, good svn sleuthing Marvin!

Responses below...


On Tue, Jan 12, 2010 at 6:27 PM, Marvin Humphrey <mar...@rectangular.com> wrote:
> Greets,
>
> I've been trying to understand this comment regarding ArrayUtil.getNextSize():
>
>     * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
>
> Maybe I'm missing something, but I can't see how the formula yields such a
> growth pattern:
>
>    return (targetSize >> 3) + (targetSize < 9 ? 3 : 6) + targetSize;

Indeed, I can't either!  Have you notified python-dev?  Curious.

> For input values of 9 or greater, all that formula does is multiply by 1.125
> and add 6. (Values enumerated below my sig.)
>
> The comment appears to have originated with this Python commit:
>
>    
> http://svn.python.org/view/python/trunk/Objects/listobject.c?r1=35279&r2=35280
>
> I think it was wrong then, and it's wrong now.
>
> The primary purpose of getNextSize() is to minimize reallocations during
> dynamic array resizing by overallocating on certain actions.

Right, and also to strike a balance with not wasting too much
over-allocated but not-yet-used RAM (hence 1/8 growth, not 1/2 or 1).

> Exactly how much we overallocate by doesn't seem to matter that much.  Python
> apparently adds an extra eighth or so.  Ruby reportedly multiplies by 1.5.
> Theoretically, multipliers larger than the golden mean are supposed to be
> suboptimal because they tend to induce memory fragmentation: subsequent
> reallocations cannot reuse previously freed sections, because they never add
> up to the total required by the newly requested fragment.  However, that
> assumes a reasonably closed memory usage pattern, and so long as the freed
> fragment can be reused by someone else somewhere, it won't go to waste.
>
> IMO, minimizing memory fragmentation is so dependent on the internal
> implementation of the system's memory allocator as to be not worth the
> trouble,

I agree, and being in javaland muddies up the implementation
dependence even more, for Lucene java.

> but if we were to do it, I think the right approach is outlined in
> this comment documenting the intention of the Python resizing routine prior to
> the commit that introduced the new (broken?) algo:
>
>    
> http://svn.python.org/view/python/trunk/Objects/listobject.c?revision=35125&view=markup
>
>    /* Round up:
>     * If n <       256, to a multiple of        8.
>     * If n <      2048, to a multiple of       64.
>     * If n <     16384, to a multiple of      512.
>     * If n <    131072, to a multiple of     4096.
>     * If n <   1048576, to a multiple of    32768.
>     * If n <   8388608, to a multiple of   262144.
>     * If n <  67108864, to a multiple of  2097152.
>     * If n < 536870912, to a multiple of 16777216.
>     * ...
>     * If n < 2**(5+3*i), to a multiple of 2**(3*i).
>
> I can't really see the point of adding the small constant (6) for large
> values, as is done in the new algo.

I agree.

> And if oversizing is important for small
> values (debatable, since there will always be lots of small memory chunks
> floating around in the allocation pool), then rounding up to 8 consistently
> makes more sense to me than the current behavior.
>
> IMO, just overallocating by some multiplier between 1.125 and 1.5 achieves our
> primary goal of avoiding pathological reallocation behavior, and that's 
> enough.
> How about simplifying ArrayUtil.getNextSize() down to this?
>
>    return targetSize + (targetSize / 8);

But, you have to do something different when targetSize < 8, else that
formula doesn't grow.

Also, I think for smallish sizes we want faster than 12.5% growth,
because the constant-time cost of the mechanics of doing a
reallocation are at that point relatively high (wrt the cost of
copying the bytes over to the new array).  I think the +3 or +6 in the
current formula achieves that, but I agree, for larger sizes the +6
seems silly.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Dynamic array reallocation algorithms

Reply via email to