On 9/23/13 11:06 PM, Manu wrote:
On 24 September 2013 15:31, Andrei Alexandrescu
<[email protected] <mailto:[email protected]>>
wrote:
On 9/23/13 9:56 PM, Manu wrote:
You can't go wasting GPU memory by overallocating every block.
Only the larger chunk may need to be overallocated if all
allocations are then rounded up.
I don't follow.
If I want to allocate 4k aligned, then 8k will be allocated (because it
wants to store an offset).
What do extant GPU allocators do here?
Any smaller allocation let's say, 16 bytes, will round up to 4k. You
can't waste precious gpu ram like that.
That's easy, you just segregate allocations by size.
A minimum and a maximum (guaranteed without over-allocating) alignment
may be useful.
What's the semantics of the minimum?
But I think allocators need to be given the opportunity to do the best
it can.
It's definitely important that allocator's are able to receive an
alignment request, and give them the opportunity to fulfill a
dynamic
alignment request without always resorting to an over-allocation
strategy.
I'd need a bit of convincing. I'm not sure everybody needs to pay
for a few, and it is quite possible that malloc_align suffers from
the same fragmentation issues as the next guy. Also, there's always
the possibility of leaving some bits to lower-level functions.
What are they paying exactly? An extra arg to allocate that can probably
be defaulted?
void[] allocate(size_t bytes, size_t align = this.alignment) shared;
For allocating relatively small objects (say up to 32K), we're looking
at tens of cycles, no more. An extra argument needs to be passed around
and more importantly looked at and acted upon. At this level it's a
serious dent in the time budget.
Part of the matter is that small objects must in a way the fastest to
allocate. For larger objects, it is true to some extent that the caller
will do some work with the obtained memory, which offsets the relative
cost of allocation. (That being said, Jason Evans told me you can't
always assume the caller will do at least "a memset amount of work".)
Anyhow it stands to reason that you don't want to pay for matters
related to alignment without even looking.
Or is it the burden of adding the overallocation boilerplate logic to
each allocator for simple allocators that don't want to deal with
alignment in a conservative way?
I imagine that could possibly be automated, the boilerplate could be
given as a library.
void[] allocate(size_t size, size_t align)
{
size_t allocSize =
std.allocator.getSizeCompensatingForAlignment(size, align);
void[] mem = ...; // allocation logic using allocSize
return std.allocator.alignAllocation(mem, align); // adjusts the
range, and maybe write the offset to the prior bytes
}
One possibility I'm thinking of is to make maximum alignment a static
property of the allocator. It may be set during runtime, but a given
allocator object has one define maximum allocation. Would that be
satisfactory to all?
Andrei