Re: Dynamic array reallocation algorithms

DM Smith Wed, 13 Jan 2010 05:53:52 -0800

On Jan 13, 2010, at 1:00 AM, Marvin Humphrey wrote:

> On Tue, Jan 12, 2010 at 10:46:29PM -0500, DM Smith wrote:
> 
>> So starting at 0, the size is 0.
>> 0 => 0
>> 0 + 1 => 4
>> 4 + 1 => 8
>> 8 + 1 => 16
>> 16 + 1 => 25
>> 25 + 1 => 35
>> ...
>> 
>> So I think the copied python comment is correct but not obviously correct.
> 
> So those numbers are supposed to be where the transitions occur?  But that's
> not where the jumps are.  The jumps happen at...
> 
>    8, 16, 24, 32, 40, 48...
> 
> ... as you'd expect when adding (input >> 3), which is after all just a more
> obscure way of writing (input / 8).


Not internally. n = n >> 3 is a bit pattern change. n = n / 8, in Java will 
produce a floating point intermediate and the assignment will do a truncation.

> 
> Sorry for being dense, but I still don't get it.

No, there are no "supposed" transitions. The pattern that is stated in the 
comment only occurs under a very specific situation: The growing of an array 
one element at a time and reallocation only when the current is exceeded.

However, this pattern may not apply for other situations. If there is a bulk 
insert where the array is grown sufficiently for the entire insert and then the 
elements are added, then the stated pattern does not happen. Example, a 
character array with a char sequence append.

I'm guessing your comment "I still don't get it." really is about the merits of 
the formula. My response was not to its merits.

> 
>> The addition of 3 or 6 only helps initially, after some point it is just
>> noise. It has the characteristic of being less aggressive with subsequent
>> allocations.
> 
> Well, I have my doubts about whether this actually helps or not.  It doesn't
> seem general purpose enough.

I have my doubts too.

Another algorithm I had seen allocates 11 slots if the size is unstated or less 
than 11. (Don't know why it was 11, but it stuck in my mind because it was 
odd.) After that the formula kicked in.

> 
> For an array of bytes, the desirable behavior is clear -- you really ought
> to round up to at least the size of a pointer because you're never going to
> return a non-aligned buffer.  Feeding 10 into getNextSize() and getting back
> 17 is weird -- you should get back either 16 or 24.

I understand the argument for byte alignment and have seen its effect. WRT to 
an array there may be an overhead to the array in addition to the size of the 
each slot. Lets assume a system that has 32 bits to hold the size of the array, 
then maybe the number of slots should be 1 less than the pattern that you have 
given? Maybe the overhead is different for small arrays (e.g. 8 bits), medium 
arrays (16 bits), .... If so then maybe an addition of 3 or 6 makes sense for 
the python interpreter? Maybe some other numbers would be appropriate for Java. 
Maybe it does not matter at all.

> 
> I also consider it strange that if you ask for 0 you get 3.  A lot of the
> time, if you're asking for 0 it's because the resource may never need to be
> allocated.

ArrayUtil.getNextSize should not be called for 0, should return a specific 
minimum (e.g. MIN_BUFFER_SIZE) or return 0.

If it does not return 0 for 0, then it should not be used when 0 is wanted. 
This makes code even more awkward.

I prefer that 0 return 0.


>  And if you know that the resource actually is going to be needed,
> you're going to write code like this, from TermAttributeImpl.java:
> 
>    termBuffer = new char[ArrayUtil.getNextSize(newSize < MIN_BUFFER_SIZE ?  
> MIN_BUFFER_SIZE : newSize)];
> 
> (MIN_BUFFER_SIZE is 10.)
> 
> But whatever.  "3" and "6" look more significant on first read of the code
> than they actually are, but they're only strange, not detrimental to
> performance.

I have less problem with 3 and 6 than the addition of the requested size. 3 and 
6 might be an alignment factor as noted above. But the requested size is most 
likely to yield an amount that is not aligned.

I think it would be better to add in the next boundary greater than the 
requested size:

growth factor + alignment factor + size adjusted to the next boundary.

(n>>3) + af + (((n>>3) + 1) << 3)

>  I'm just ticked off because I spent so much time trying to
> understand code which turns out to do so little.

What is needed may differ by the array that is being grown. A single purpose 
algorithm is not appropriate for all situations. Growing an array by 1/8 may be 
too aggressive for slow growing arrays and may not be aggressive for fast 
growing arrays. I think our guess is that arrays need to grow more quickly at 
first and less later.

But without knowing where that inflection should occur how can it be tuned 
otherwise.

If it is known in advance that an array will be of a particular magnitude, then 
it is best to pre-allocate that array to that size.

maybe growth factor should be based on n as in

n << (n < y ? 2 : 3)

Maybe doubling early, growing by 1/2 for a while then by 1/8 then by 1/16, .....

And now my head hurts....

> 
>> I'm not really up on whether this is best, but it is better than the
>> doubling algorithm that it replaced. I think your suggestion that such an
>> algorithm might contribute to fragmented memory is interesting. I wonder if
>> C, perl and java have different issues regarding that.
> 
> It's wherever the memory allocator lives.  That could be the JVM, it could be
> glibc, it could be your own custom allocator.  If you compile Perl with
> -DUSE_MY_MALLOC it uses its own allocator, otherwise it uses the system's
> malloc.  KinoSearch actually has a dedicated allocator it uses for a very
> targeted purpose, and this allocator has its own strategy for avoiding
> fragmentation.  The golden mean issue is relevant to all of those allocators.
> 
> Marvin Humphrey
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Dynamic array reallocation algorithms

Reply via email to