Re: Enhanced array appending

Steven Schveighoffer Thu, 24 Dec 2009 12:00:14 -0800

On Thu, 24 Dec 2009 13:39:21 -0500, grauzone <[email protected]> wrote:

Sorry about that, I didn't have a close look at the patch. I guess I wasmore thinking about Andrei's original proposal (and how I thought it maybe implemented).


No problem.

It seems you store the length field inside the array's memory block(instead of the cache, which just speeds up gc_query). That's awesome,but I'm going to complain again: now you have to keep a length field for*all* memory allocations, not only arrays! For most object allocations,this means 4 more bytes additional overhead.

Interestingly enough, the storage overhead is zero except for memoryblocks > 256 bytes. I'll explain:

A probably not well-known piece of trivia is that D allocates 1 extra bytefor arrays. Why would it do this you say? Because of the GC.

Imagine that it does not do this, what happens when you do something likethis:


ubyte[] array = new ubyte[16]; // allocates a 16-byte block for the array
array = array[$..$];

If you look at array's ptr member, it now no longer points to theallocated block, but the next block. Although it isn't important for theGC to keep around the allocated block any more, it now will keep the nextblock from being collected. In addition, if you tried appending to array,it might start using unallocated memory!

So the byte at the end already was in use. I sort of commandeered it forlength use. For blocks up to and including 256 bytes, I use the last byteof the block as the length storage. For blocks of 512 to a half page(2048 bytes), I use the last 2 bytes, so there is one extra overhead bytecompared to the current implementation.

Blocks larger than that follow different rules, they are not stored inbins, but just a whole page at a time. With those blocks, it is possibleto extend the block by adding more pages if they are free, so it's not OKto store the length at the end of the block, since the end of the blockmay change. So I store at the beginning. I use up a full 8 bytes, andthe reason is alignment, I don't know what you're putting in the array, soI must keep the data 8 byte-aligned.

For classes, I allocate the required extra data as if the class were anarray of the class data size, and then set the "ghost" length at the endof the block to 0. If a class data exceeds a half page, the ghost lengthis at the beginning, where the vtable ptr is, so it's extremely unlikelyto accidentally match that length. Note that it doesn't make a hugedifference in most cases because the block used for the class is a powerof 2 anyways, so in most cases there's plenty of wasted space at the end.

I found out during testing that allocating a new struct is equivalent toallocating a new array of that struct of size 1, and returning it'spointer, so that aspect is already covered.

Also, if you use GC.malloc directly, and the user tries to append toslices to it, your code may break. GC.malloc doesn't seem to pad thememory block with a length field.

Yes, this is a possible problem. However, using GC.malloc directly andthen treating the result as a normal array is probably extremely rare. Atleast it is not valid for safe D. It probably should be explained as adanger in GC.malloc, but I don't think this will adversely affect mostcode. There will be functions that should obviate the need for callingGC.malloc to allocate an array (specifically, allocating uninitializeddata).

I must say that I find your argumentation strange: didn't you say addingan additional field to the slice struct is too much overhead?

Overhead when passing around, not overhead for storage. For example,pushing 12 bytes onto the stack instead of 8 when calling a function witha slice. If you want safe appends, you need to store the allocationlength somewhere, there's no way around that.

Also, a solution which keeps the pointer to the array length in theslice struct would still be faster. The cache lookup cost is not zero.

This is the trade-off I chose between performance when passing aroundslices and using them and performance for appending. If you focus allyour performance concerns on appending, then other areas suffer. I don'tknow if what I chose will be the best solution, but I can't think of anyway to have *both* speedy slice usage and near-zero append overhead. Ifsomeone can think of a better solution, I'll be happy to incorporate it,but after using and understanding slices while developing stuff for Tango(Tango uses slices to get every ounce of performance!), I'm convinced thatas-fast-as-possible slice semantics for passing around data is essential.

This is where a custom type that focuses performance on appending at theexpense of other functions is good to have. I think such applicationswhere you need the absolute best append performance without any otherfunctionality are pretty rare.

Anyway, now that semantics and performance are somewhat sane, none ofthese remaining issues are too important. Thanks Steve and Andrei!


Cool!

-Steve

Re: Enhanced array appending

Reply via email to