Phew, good svn sleuthing Marvin! Responses below...
On Tue, Jan 12, 2010 at 6:27 PM, Marvin Humphrey <mar...@rectangular.com> wrote: > Greets, > > I've been trying to understand this comment regarding ArrayUtil.getNextSize(): > > * The growth pattern is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ... > > Maybe I'm missing something, but I can't see how the formula yields such a > growth pattern: > > return (targetSize >> 3) + (targetSize < 9 ? 3 : 6) + targetSize; Indeed, I can't either! Have you notified python-dev? Curious. > For input values of 9 or greater, all that formula does is multiply by 1.125 > and add 6. (Values enumerated below my sig.) > > The comment appears to have originated with this Python commit: > > > http://svn.python.org/view/python/trunk/Objects/listobject.c?r1=35279&r2=35280 > > I think it was wrong then, and it's wrong now. > > The primary purpose of getNextSize() is to minimize reallocations during > dynamic array resizing by overallocating on certain actions. Right, and also to strike a balance with not wasting too much over-allocated but not-yet-used RAM (hence 1/8 growth, not 1/2 or 1). > Exactly how much we overallocate by doesn't seem to matter that much. Python > apparently adds an extra eighth or so. Ruby reportedly multiplies by 1.5. > Theoretically, multipliers larger than the golden mean are supposed to be > suboptimal because they tend to induce memory fragmentation: subsequent > reallocations cannot reuse previously freed sections, because they never add > up to the total required by the newly requested fragment. However, that > assumes a reasonably closed memory usage pattern, and so long as the freed > fragment can be reused by someone else somewhere, it won't go to waste. > > IMO, minimizing memory fragmentation is so dependent on the internal > implementation of the system's memory allocator as to be not worth the > trouble, I agree, and being in javaland muddies up the implementation dependence even more, for Lucene java. > but if we were to do it, I think the right approach is outlined in > this comment documenting the intention of the Python resizing routine prior to > the commit that introduced the new (broken?) algo: > > > http://svn.python.org/view/python/trunk/Objects/listobject.c?revision=35125&view=markup > > /* Round up: > * If n < 256, to a multiple of 8. > * If n < 2048, to a multiple of 64. > * If n < 16384, to a multiple of 512. > * If n < 131072, to a multiple of 4096. > * If n < 1048576, to a multiple of 32768. > * If n < 8388608, to a multiple of 262144. > * If n < 67108864, to a multiple of 2097152. > * If n < 536870912, to a multiple of 16777216. > * ... > * If n < 2**(5+3*i), to a multiple of 2**(3*i). > > I can't really see the point of adding the small constant (6) for large > values, as is done in the new algo. I agree. > And if oversizing is important for small > values (debatable, since there will always be lots of small memory chunks > floating around in the allocation pool), then rounding up to 8 consistently > makes more sense to me than the current behavior. > > IMO, just overallocating by some multiplier between 1.125 and 1.5 achieves our > primary goal of avoiding pathological reallocation behavior, and that's > enough. > How about simplifying ArrayUtil.getNextSize() down to this? > > return targetSize + (targetSize / 8); But, you have to do something different when targetSize < 8, else that formula doesn't grow. Also, I think for smallish sizes we want faster than 12.5% growth, because the constant-time cost of the mechanics of doing a reallocation are at that point relatively high (wrt the cost of copying the bytes over to the new array). I think the +3 or +6 in the current formula achieves that, but I agree, for larger sizes the +6 seems silly. Mike --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org