Is it correct to assume that any hwloc_membind_flags_t flags can be or'ed
together except _THREAD and _PROCESS?
By their values, it looks like policy flags cannot be OR'ed. This is probably
worth mentioning in the docs (I can do so, but won't commit until the rest of
these questions are answered).
Here's all the policy flags:
-----
HWLOC_MEMBIND_DEFAULT = 0, /**< \brief Reset the memory allocation
policy to the system default.
* \hideinitializer */
HWLOC_MEMBIND_FIRSTTOUCH = 1, /**< \brief Allocate memory on the
given nodes, but preferably on the
node where the first accessor is
running.
* \hideinitializer */
-----
I'm not quite sure what "where the first accessor is running" means. Does this
mean that the intent is that the memory will be bound to the numa node local to
the first thread that touches the memory?
If so, does this happen on a page-by-page basis, or as a whole allocation?
Consider this example (assume no race conditions):
1. allocate 2 pages with the FIRSTTOUCH policy
2. thread A on node X only touches page 0
3. later, thread B on node Y touches page 1
Where are pages 0 and 1 bound? Are they bound to X and Y, respectively, or are
both bound to X?
...or is the answer OS/system specific? If so, is there a way to find out
which way it bound?
-----
HWLOC_MEMBIND_BIND = 2, /**< \brief Allocate memory on the
given nodes.
* \hideinitializer */
HWLOC_MEMBIND_INTERLEAVE = 3, /**< \brief Allocate memory on the
given nodes in a round-robin manner.
* \hideinitializer */
-----
What is the unit of distribution -- is it by page? E.g., if I specify 4 numa
nodes and allocate 10 pages, are they bound like this:
node A: 0, 4, 8
node B: 1, 5, 9
node C: 2, 6
node D: 3, 7
Or does it (more-or-less) equally distribute the pages across the 3 nodes, like
this:
node A: 0, 1, 2
node B: 3, 4, 5
node C: 6, 7
node D: 8, 9
...or is the answer OS/system specific? If so, is there a way to find out
which way it bound?
-----
HWLOC_MEMBIND_REPLICATE = 4, /**< \brief Replicate memory on the
given nodes.
* \hideinitializer */
-----
Does this mean that if I allocate 10 pages worth of memory with 2 nodes
specified, I'm actually allocating 2x that amount and duplicating it on both
nodes? I.e., is the memory bound like this:
node A: 0, 1, 2, ..., 9
node B: 0, 1, 2, ..., 9
and that a write to page 0 will physically write to *both* pages? If so,
what's the cost of the write? Is it the time to write to all nodes, or the
time to write to the first node that was specified?
What happens with reads? Does the data come from the first node that was
specified, and therefore the cost of a read is the cost of getting the data
from the first node that was specified?
More specifically, what's the point of REPLICATE? Is it solely for memory
hardware fault tolerance (e.g., intel RAS)?
What happens if the hardware/OS isn't capable of doing REPLICATE? Will some
kind of error be returned?
-----
HWLOC_MEMBIND_NEXTTOUCH = 5 /**< \brief On next touch of existing
allocated memory, migrate it to the node
* where the memory reference happened.
* \hideinitializer */
-----
What happens if the memory was not previously bound?
Same questions as above with FIRSTTOUCH -- is this on a page-by-page basis, or
as an entire allocation? E.g., if I allocate/bind 10 pages, then later set it
to NEXTTOUCH, and then touch the 4th page, will the entire memory be moved to
the numa node that is local to the thread where the touch occurred, or just the
4th page?
Thanks!
--
Jeff Squyres
[email protected]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/