On Tuesday, 24 September 2013 at 16:06:39 UTC, Andrei
Alexandrescu wrote:
On 9/24/13 4:38 AM, Dan Schatzberg wrote:
One thing I'm not sure is addressed by this design is memory
locality. I
know of libnuma http://linux.die.net/man/3/numa which allows
me to
express what NUMA domain my memory should be allocated from at
run-time
for each allocation.
In the case that I want to allocate memory in a specific NUMA
domain
(not just local vs non-local), I believe this design is
insufficient
because the number of domains are only known at run-time.
Also, as far as alignment is concerned I will throw in that
x86 is
relatively unique in having a statically known cache-line
size. Both ARM
and PowerPC cores can differ in their cache-line sizes. I feel
this is a
significant argument for the ability to dynamically express
alignment.
Could you send a few links so I can take a look?
My knee-jerk reaction to this is that NUMA allocators would
provide their own additional primitives and not participate
naively in compositions with other allocators.
Andrei
Not sure what kind of links you're looking for
The following link is a good discussion of the issue and the
current solutions
http://queue.acm.org/detail.cfm?id=2513149
In particular:
"The application may want fine-grained control of how the
operating system handles allocation for each of its memory
segments. For that purpose, system calls exist that allow the
application to specify which memory region should use which
policies for memory allocations.
The main performance issues typically involve large structures
that are accessed frequently by the threads of the application
from all memory nodes and that often contain information that
needs to be shared among all threads. These are best placed using
interleaving so that the objects are distributed over all
available nodes."
The Linux/libc interfaces are linked in my first comment.
Specifically with the mbind() call one can specify the policy for
allocations from a virtual address range (which NUMA node to
allocate the backing physical page from). More generally you
could imagine specifying this per allocation.
What is your objective though? Aren't you trying to define a
hierarchy of allocators where more specific allocators can be
composed from general ones? In which case what is the concern
with including locality at the base level? It seems to be one
characteristic of memory that programmers might be concerned with
and rather trivially you can compose a non-locality aware
allocator from a locality aware allocator.