On 8/16/22 19:33, Steven Schveighoffer wrote:

> Everything in the memory allocator is in terms of pages. A pool is a
> section of pages. The large blocks are a *multiple* of pages, whereas
> the small blocks are pages that are *divided* into same-sized chunks.

Thank you. I am appreciating this discussion a lot.

> When you want to allocate a block, if it's a half-page or less, then it
> goes into the small pool, where you can't glue 2 blocks together.

That's how it should be because we wouldn't want to work to know which blocks are glued.

>>  > The reason why your `bad` version fails is because when it must
>>  > reallocate, it still is only allocating 1 element.

I will get back to this 1 element allocation below.

>> That part I still don't understand. The same block of e.g. 16 bytes
>> still has room. Why not use that remaining portion?
>
> It does until it doesn't. Then it needs to reallocate.

I think my problem was caused by my test data being 16 bytes and (I think) the runtime picked memory from the 16-byte pool without any hope of future growth into adjacent block.

Using a 16-byte block sounds like a good strategy at first because nobody knows whether an array will get more than one element.

However, if my guess is correct (i.e. the first element of size of 16-bytes is placed on a 16-byte block), then the next allocation will always allocate memory for the second element.

One might argue that dynamic arrays are likely to have more than a single element, so the initial block should at least be twice the element size. This would cut memory allocation by 1 count for all arrays. And in my case of 1-element arrays, allocation count would be halved. (Because I pay for every single append right now.)

Of course, I can understand that there can be applications where a large number of arrays (e.g. a 2D array) may have zero-elements or one-element, in which case my proposal of allocating the first element on a e.g. 32-byte block would be wasteful.

I think such cases are rare and we incur 1 extra allocation penalty for all arrays for that strategy.

> Maybe some ASCII art? `A` is "used by the current slice", `x` is
> "allocated, but not referenced by the array". `.` is "part of the block
> but not used (i.e. can grow into this)". `M` is "metadata"
>
> ```d
> auto arr = new ubyte[10];      // AAAA AAAA AA.. ...M
> arr = arr[1 .. $];             // xAAA AAAA AA.. ...M
> arr ~= 0;                      // xAAA AAAA AAA. ...M
> arr ~= 0;                      // xAAA AAAA AAAA ...M
> arr = arr[3 .. $];             // xxxx AAAA AAAA ...M
> arr ~= cast(ubyte[])[1, 2, 3]; // xxxx AAAA AAAA AAAM // full!
> arr ~= 1;                      // AAAA AAAA AAAA ...M // reallocated!
> arr.ptr has changed
> ```

That all makes sense. I didn't think the meta data would be at the end but I sense it's related to the "end slice", so it's a better place there. (?)

> Metadata isn't always stored. Plus, it's optimized for the block size.
> For example, any block that is 256 bytes or less only needs a single
> byte to store the "used" space.

That's pretty interesting and smart.

> What is your focus? Why do you really want this "optimization" of gluing
> together items to happen?

This is about what you and I talked about in the past and something I mentioned in my DConf lightning talk this year. I am imagining a FIFO-like cache where elements of a source range are stored. There is a sliding window involved. I want to drop the unused front elements because they are expensive in 2 ways:

1) If the elements are not needed anymore, I have to move my slice forward so that the GC collects their pages.

2) If they are not needed anymore, I don't want to even copy them to a new block because this would be expensive, and in the case of an infinite source range, impossible.

The question is when to apply this dropping of old front elements. When I need to add one more element to the array, I can detect whether this *may* allocate by the expression 'arr.length == arr.capacity' but not really though, because the runtime may give me adjacent room without allocation. So I can delay the "drop the front elements" computation because there will be no actual copying at this time.

But the bigger issue is, because I drop elements my array never gets large enough to take advantage of this optimization and there is an allocation for every single append.

> https://dlang.org/phobos/core_memory.html#.GC.extend

Ok, that sounds very useful. In addition to "I can detect when it *may* allocate", I can detect whether there is adjacent free room. (I can ask for just 1 element extension; I tested; and it works.) (I guess this GC.extend has to grab a GC lock.)

However, for that to work, I seem to need the initial block pointer that the GC knows about. (My sliding array's .ptr not work, so I have to save the initial arr.ptr).

Conclusion:

1) Although understanding the inner workings of the runtime is very useful and core.memory has interesting functions, it feels too much fragile work to get exactly what I want. I should manage my own memory (likely still backed by the GC).

2) I argue that the initial allocation should be more than 1 element so that we skip 1 allocation for most arrays and 50% for my window-of-1 sliding window case.

Ali

Reply via email to