Re: In-place extension of arrays only for certain alignment?

Ali Çehreli via Digitalmars-d-learn Wed, 17 Aug 2022 11:48:43 -0700

On 8/16/22 19:33, Steven Schveighoffer wrote:

> Everything in the memory allocator is in terms of pages. A pool is a
> section of pages. The large blocks are a *multiple* of pages, whereas
> the small blocks are pages that are *divided* into same-sized chunks.


Thank you. I am appreciating this discussion a lot.

> When you want to allocate a block, if it's a half-page or less, then it
> goes into the small pool, where you can't glue 2 blocks together.

That's how it should be because we wouldn't want to work to know whichblocks are glued.


>>  > The reason why your `bad` version fails is because when it must
>>  > reallocate, it still is only allocating 1 element.

I will get back to this 1 element allocation below.

>> That part I still don't understand. The same block of e.g. 16 bytes
>> still has room. Why not use that remaining portion?
>
> It does until it doesn't. Then it needs to reallocate.

I think my problem was caused by my test data being 16 bytes and (Ithink) the runtime picked memory from the 16-byte pool without any hopeof future growth into adjacent block.

Using a 16-byte block sounds like a good strategy at first becausenobody knows whether an array will get more than one element.

However, if my guess is correct (i.e. the first element of size of16-bytes is placed on a 16-byte block), then the next allocation willalways allocate memory for the second element.

One might argue that dynamic arrays are likely to have more than asingle element, so the initial block should at least be twice theelement size. This would cut memory allocation by 1 count for allarrays. And in my case of 1-element arrays, allocation count would behalved. (Because I pay for every single append right now.)

Of course, I can understand that there can be applications where a largenumber of arrays (e.g. a 2D array) may have zero-elements orone-element, in which case my proposal of allocating the first elementon a e.g. 32-byte block would be wasteful.

I think such cases are rare and we incur 1 extra allocation penalty forall arrays for that strategy.


> Maybe some ASCII art? `A` is "used by the current slice", `x` is
> "allocated, but not referenced by the array". `.` is "part of the block
> but not used (i.e. can grow into this)". `M` is "metadata"
>
> ```d
> auto arr = new ubyte[10];      // AAAA AAAA AA.. ...M
> arr = arr[1 .. $];             // xAAA AAAA AA.. ...M
> arr ~= 0;                      // xAAA AAAA AAA. ...M
> arr ~= 0;                      // xAAA AAAA AAAA ...M
> arr = arr[3 .. $];             // xxxx AAAA AAAA ...M
> arr ~= cast(ubyte[])[1, 2, 3]; // xxxx AAAA AAAA AAAM // full!
> arr ~= 1;                      // AAAA AAAA AAAA ...M // reallocated!
> arr.ptr has changed
> ```

That all makes sense. I didn't think the meta data would be at the endbut I sense it's related to the "end slice", so it's a better placethere. (?)


> Metadata isn't always stored. Plus, it's optimized for the block size.
> For example, any block that is 256 bytes or less only needs a single
> byte to store the "used" space.

That's pretty interesting and smart.

> What is your focus? Why do you really want this "optimization" of gluing
> together items to happen?

This is about what you and I talked about in the past and something Imentioned in my DConf lightning talk this year. I am imagining aFIFO-like cache where elements of a source range are stored. There is asliding window involved. I want to drop the unused front elementsbecause they are expensive in 2 ways:

1) If the elements are not needed anymore, I have to move my sliceforward so that the GC collects their pages.

2) If they are not needed anymore, I don't want to even copy them to anew block because this would be expensive, and in the case of aninfinite source range, impossible.

The question is when to apply this dropping of old front elements. WhenI need to add one more element to the array, I can detect whether this*may* allocate by the expression 'arr.length == arr.capacity' but notreally though, because the runtime may give me adjacent room withoutallocation. So I can delay the "drop the front elements" computationbecause there will be no actual copying at this time.

But the bigger issue is, because I drop elements my array never getslarge enough to take advantage of this optimization and there is anallocation for every single append.


> https://dlang.org/phobos/core_memory.html#.GC.extend

Ok, that sounds very useful. In addition to "I can detect when it *may*allocate", I can detect whether there is adjacent free room. (I can askfor just 1 element extension; I tested; and it works.) (I guess thisGC.extend has to grab a GC lock.)

However, for that to work, I seem to need the initial block pointer thatthe GC knows about. (My sliding array's .ptr not work, so I have to savethe initial arr.ptr).


Conclusion:

1) Although understanding the inner workings of the runtime is veryuseful and core.memory has interesting functions, it feels too muchfragile work to get exactly what I want. I should manage my own memory(likely still backed by the GC).

2) I argue that the initial allocation should be more than 1 element sothat we skip 1 allocation for most arrays and 50% for my window-of-1sliding window case.

Ali

Re: In-place extension of arrays only for certain alignment?

Reply via email to