On Fri, 02 Jul 2010 01:34:26 -0400, Ali Çehreli <[email protected]> wrote:

BCS wrote:

 >> There is something strange in the array capacity algorithm.
 >> [...]
 >> Is that intended?

 > I'm going to take a guess that the gc is (after some point) allocating
 > into the smallest hole with "enough" capacity.

You may be right. :)

 > If you re run the test
> but with something else going on to add some noise, do the spikes move?

Yes, the spikes move.

 > void makeNoise()
 > {
 >      static byte[][256] data;
 >      data[rand() % $] = new byte[rand() % 512];
 > }

I am not familiar with the dmd internals; but following the code took me to the function newCapacity() in src/druntime/src/rt/lifetime.d. The capacity growth is more complicated than what I've been assuming so far.

And yes, GC is in the picture. Quoting the comments from that file:

/*
  * Better version by Dave Fladebo:
  * This uses an inverse logorithmic algorithm to pre-allocate a bit more
  * space for larger arrays.
  * - Arrays smaller than PAGESIZE bytes are left as-is, so for the most
  * common cases, memory allocation is 1 to 1. The small overhead added
  * doesn't affect small array perf. (it's virtually the same as
  * current).
  * - Larger arrays have some space pre-allocated.
  * - As the arrays grow, the relative pre-allocated space shrinks.
  * - The logorithmic algorithm allocates relatively more space for
  * mid-size arrays, making it very fast for medium arrays (for
  * mid-to-large arrays, this turns out to be quite a bit faster than the
  * equivalent realloc() code in C, on Linux at least. Small arrays are
  * just as fast as GCC).
  * - Perhaps most importantly, overall memory usage and stress on the GC
  * is decreased significantly for demanding environments.
  */

So there may not be a bug after all...

Ali

If your original code is all that was being run, I think there is a bug.

I'll tell you why -- the GC provides a way to append pages to an allocated block. Essentially, you can glue consecutive unused pages together to make a larger block, thereby growing your block.

But anything under a page is simply reallocated because anything less than a page is inside a pool of blocks of the same size, and those cannot be glued together.

So when your array is growing, and it gets larger than a page, it should grow a single page at a time. But I think (and I stress think, I'm not sure) that when a block of pages is deallocated, each page becomes independently free. They do not stay glued together. So for an allocating requesting 1 or 2 more pages to grow by 2000 pages is suspect. I still have to examine the code, but I think there is a problem.

Back to my original assertion -- to disprove the theory that some other "larger hole" is why it gets so big, in one instance your number of elements grows from 1,155,070 (4MB) to 3,153,917 (12MB), that's almost a 200% increase. In order for that 12MB hole to exist, something must have allocated it before, and then deallocated it. It can't be your loop because your loop has not allocated that much space yet. I would find that very strange to happen somewhere in the runtime without you specifically allocating it. If however, your original post is not the entire code, then there may be some possibility to that.

BTW, you can take the newCapacity function out of the loop by simply setting the length and then writing the last element, i.e.:

a.length += 1;
a[$-1] = i;

newCapacity isn't called when setting the length explicitly, because it's not considered to be an append.

-Steve

Reply via email to