Efficiency of using array as stack

Ivan Kazmenko Sat, 23 Mar 2013 05:15:17 -0700

Hello again!

Today, I had trouble using D built-in dynamic array as a stack:it timed out badly. I have then tried to reduce the problem downto a minimal example. I am using DMD 2.062.

Now, I have this sample program. It creates an array of lengthone million, and then keeps adding an element to the array andremoving it in a loop, also one million times. Such a pattern,and more complex ones, can easily occur whenever an array is usedas a stack.

What actually happens is constant relocation and movementresulting in 1_000_000 * 1_000_000 operations which just looksbad for me.


-----
import core.memory;
import std.range;
import std.stdio;

void main ()
{
        version (NOGC) {GC.disable ();}
        int n = 1_000_000;
        auto s = array (iota (n));
        s.reserve (n * 2);
        foreach (i; 0..n)
        {
                s.length++;
                debug {writefln ("after ++: %s %s", s.capacity, &s[0]);}
                s.length--;
                debug {writefln ("after --: %s %s", s.capacity, &s[0]);}
        }
}
-----

Running debug build, we can see that the address of s[0] changesafter each increase of length, and the capacity is reduced tozero after each decrease. So, the line "s.reserve (n * 2)" failsto hint the compiler that I want to allocate the array once andthen use it without relocation - and that I would like to be ableto do!

I was wondering if garbage collection has something to do withthe inefficiency, but the answer is no. If we choose the "NOGC"version, it quickly fills some large portion of memory (1GB in mylocal test) with the copies and stops responding. If GC isactive, it just constantly swaps between two memory locations.

Now, in section 4.1.10 of TDPL ("Assigning to .length") it isstated that "If the array shrinks <...>, D guarantees that thearray is not reallocated", and later, "that guarantee does notalso imply that further expansions of the array will avoidreallocation", so, formally, everything is by the letter of thelaw so far.

However, inserting another "s.reserve (n * 2)" just after"s.length--" does not help either. And I think the whole thingdoes contradict the intent described in TDPL section 4.1.9("Expanding"). Here it goes: "If there is no slack space left,... The allocator may find an empty block to the right of thecurrent block and gobble it into the current block. Thisoperation is known as coalescing". Now, this quote and itscontext give no formal guarantee that the memory allocator worksexactly like that, but they definitely sound like a good thing todo.

I hope many would agree that reducing the length once does not atall imply we want to reduce it further. Frankly, I have thoughtso far that D dynamic arrays can be used as queue and stackimplementations done "right", i.e., efficiently.


So my two questions are:

1. I would like to have a way to reduce the length of the arraywithout removing the guarantees of the preceding "s.reserve". Isit possible in the current D implementation? How?

2. Ideally, I would like a dynamic array in D to act efficientlyas a stack. That means, the amortized cost of N stack operationsshould be O(N). To achieve this, I would propose "lazy"reduction of the space reserved for the array. I suppose theimplementation guarantees that for expanding arrays as shown inthe example at http://dlang.org/arrays.html#resize : whencapacity is equal to zero, the memory manager roughly doubles theallocated size of the array. The very same trick could be usedfor array shrinking: instead of reducing the capacity to zero(i.e., guaranteeing to allocate the exact amount, the memorymanager could leave the allocated size equal to (for example) min(prev, cur * 2) where prev is the allocated size before theshrinking and cur is the size used after the shrinking.

I suspect that the above suggestion could conflict some otherarray usage patterns because the array syntax actually deals witharray views, not arrays "in flesh" (in memory). One case I canimagine is the following:


-----
import std.range;
import std.stdio;

void main ()
{
        auto a = array (iota (30)); // [0, 1, ..., 29]
        auto b = a[10..$]; // [10, 11, ..., 19]
        b.length -= 10; // *** what is now b.capacity?

b.length += 10; // now b should be [10, 11, ..., 19, 0, 0, ...,0]

        writeln (b);
}
-----

Now, at line ***, if memory manager leaves b.capacity as 10, itwill point at the space occupied by the array a, which does notsound right. As I'm not a D expert (yet? ;) such investigationsare insightful), I don't know right now whether this problem issolvable. Please share your thoughts on that.

But at least if we don't have any more views into the memoryafter b (as in my first example, and generally true when an arrayis used as a stack), the memory manager could detect that andtake an optimal decision regarding b.capacity.

Thank you for reading to this point, I confess that was ratherlengthy.


-----
Ivan Kazmenko.

Efficiency of using array as stack

Reply via email to