On Wed, Jan 13, 2010 at 05:48:08AM -0500, Michael McCandless wrote:
> Have you notified python-dev?

No, not yet.  Is it kosher with the Python license to have copied-and-pasted
that comment?  It's not credited from what I can see.  Small, but we should
probably fix that.

> Right, and also to strike a balance with not wasting too much
> over-allocated but not-yet-used RAM (hence 1/8 growth, not 1/2 or 1).

I agree, a smaller size is better.  Say you start with an array which hold 800
elements but which has been over-allocated by one eighth (+100 elements =
900).  Reallocating at 900 and again at 1000-something isn't that different
from reallocating only once at 1000.  So long as you avoid reallocating
at every increment -- 801, 802, etc -- you have achieved your main goal.

Both mild and aggressive over-allocation solve that problem, but aggressive
over-allocation also involves significant RAM costs.  Where the best balance
lies depends on how bad the reallocation performance is in relation to the
cost of insertion.  An eighth seems pretty good.   Doubling seems too high.
1.5, dunno, seems like it could work.  According to this superficially
credible-seeming comment at stackoverflow, that's what Ruby does:

http://stackoverflow.com/questions/1100311/what-is-the-ideal-growth-rate-for-a-dynamically-allocated-array/1100449#1100449

> But, you have to do something different when targetSize < 8, else that
> formula doesn't grow.

That's right, it was by design.  For small sizes, things get tricky.  

If it's a byte array, you definitely want to round up to the size of a
pointer, and as we enter the era of ubiquitous 64-bit processors, rounding up
to the nearest multiple of 8 seems proper.  

But what about arrays of objects?  Would we really want every two-element
array to reserve space for 8 objects?  

The way I've got this handled in a forthcoming patch to Lucy is to trust the
user about the size they want in most cases and count on them to add the logic
for small sizes (as was done in TermBufferImpl.java).  The Grow() methods of
CharBuf, ByteBuf, and VArray obey the exact amount -- if you want an
overallocation, you'll invoke the over-estimator on the argument you supply to
Grow().  It's just the incremental appends like VArray's Push(), or CharBuf's
Cat_Char() that trigger the automatic over-allocation internally.

> Also, I think for smallish sizes we want faster than 12.5% growth,
> because the constant-time cost of the mechanics of doing a
> reallocation are at that point relatively high (wrt the cost of
> copying the bytes over to the new array).

But is that important if in most cases where it will grow incrementally,
you've already overallocated manually?

Marvin Humphrey

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Reply via email to