Hello again!

Today, I had trouble using D built-in dynamic array as a stack: it timed out badly. I have then tried to reduce the problem down to a minimal example. I am using DMD 2.062.

Now, I have this sample program. It creates an array of length one million, and then keeps adding an element to the array and removing it in a loop, also one million times. Such a pattern, and more complex ones, can easily occur whenever an array is used as a stack.

What actually happens is constant relocation and movement resulting in 1_000_000 * 1_000_000 operations which just looks bad for me.

-----
import core.memory;
import std.range;
import std.stdio;

void main ()
{
        version (NOGC) {GC.disable ();}
        int n = 1_000_000;
        auto s = array (iota (n));
        s.reserve (n * 2);
        foreach (i; 0..n)
        {
                s.length++;
                debug {writefln ("after ++: %s %s", s.capacity, &s[0]);}
                s.length--;
                debug {writefln ("after --: %s %s", s.capacity, &s[0]);}
        }
}
-----

Running debug build, we can see that the address of s[0] changes after each increase of length, and the capacity is reduced to zero after each decrease. So, the line "s.reserve (n * 2)" fails to hint the compiler that I want to allocate the array once and then use it without relocation - and that I would like to be able to do!

I was wondering if garbage collection has something to do with the inefficiency, but the answer is no. If we choose the "NOGC" version, it quickly fills some large portion of memory (1GB in my local test) with the copies and stops responding. If GC is active, it just constantly swaps between two memory locations.

Now, in section 4.1.10 of TDPL ("Assigning to .length") it is stated that "If the array shrinks <...>, D guarantees that the array is not reallocated", and later, "that guarantee does not also imply that further expansions of the array will avoid reallocation", so, formally, everything is by the letter of the law so far.

However, inserting another "s.reserve (n * 2)" just after "s.length--" does not help either. And I think the whole thing does contradict the intent described in TDPL section 4.1.9 ("Expanding"). Here it goes: "If there is no slack space left, ... The allocator may find an empty block to the right of the current block and gobble it into the current block. This operation is known as coalescing". Now, this quote and its context give no formal guarantee that the memory allocator works exactly like that, but they definitely sound like a good thing to do.

I hope many would agree that reducing the length once does not at all imply we want to reduce it further. Frankly, I have thought so far that D dynamic arrays can be used as queue and stack implementations done "right", i.e., efficiently.

So my two questions are:

1. I would like to have a way to reduce the length of the array without removing the guarantees of the preceding "s.reserve". Is it possible in the current D implementation? How?

2. Ideally, I would like a dynamic array in D to act efficiently as a stack. That means, the amortized cost of N stack operations should be O(N). To achieve this, I would propose "lazy" reduction of the space reserved for the array. I suppose the implementation guarantees that for expanding arrays as shown in the example at http://dlang.org/arrays.html#resize : when capacity is equal to zero, the memory manager roughly doubles the allocated size of the array. The very same trick could be used for array shrinking: instead of reducing the capacity to zero (i.e., guaranteeing to allocate the exact amount, the memory manager could leave the allocated size equal to (for example) min (prev, cur * 2) where prev is the allocated size before the shrinking and cur is the size used after the shrinking.

I suspect that the above suggestion could conflict some other array usage patterns because the array syntax actually deals with array views, not arrays "in flesh" (in memory). One case I can imagine is the following:

-----
import std.range;
import std.stdio;

void main ()
{
        auto a = array (iota (30)); // [0, 1, ..., 29]
        auto b = a[10..$]; // [10, 11, ..., 19]
        b.length -= 10; // *** what is now b.capacity?
b.length += 10; // now b should be [10, 11, ..., 19, 0, 0, ..., 0]
        writeln (b);
}
-----

Now, at line ***, if memory manager leaves b.capacity as 10, it will point at the space occupied by the array a, which does not sound right. As I'm not a D expert (yet? ;) such investigations are insightful), I don't know right now whether this problem is solvable. Please share your thoughts on that.

But at least if we don't have any more views into the memory after b (as in my first example, and generally true when an array is used as a stack), the memory manager could detect that and take an optimal decision regarding b.capacity.

Thank you for reading to this point, I confess that was rather lengthy.

-----
Ivan Kazmenko.

Reply via email to