Is it correct to assume that any hwloc_membind_flags_t flags can be or'ed 
together except _THREAD and _PROCESS?

By their values, it looks like policy flags cannot be OR'ed.  This is probably 
worth mentioning in the docs (I can do so, but won't commit until the rest of 
these questions are answered).

Here's all the policy flags:

-----
  HWLOC_MEMBIND_DEFAULT =       0,      /**< \brief Reset the memory allocation 
policy to the system default.
                                         * \hideinitializer */
  HWLOC_MEMBIND_FIRSTTOUCH =    1,      /**< \brief Allocate memory on the 
given nodes, but preferably on the
                                          node where the first accessor is 
running.
                                         * \hideinitializer */
-----

I'm not quite sure what "where the first accessor is running" means.  Does this 
mean that the intent is that the memory will be bound to the numa node local to 
the first thread that touches the memory?

If so, does this happen on a page-by-page basis, or as a whole allocation?  
Consider this example (assume no race conditions):

 1. allocate 2 pages with the FIRSTTOUCH policy
 2. thread A on node X only touches page 0
 3. later, thread B on node Y touches page 1

Where are pages 0 and 1 bound?  Are they bound to X and Y, respectively, or are 
both bound to X?

...or is the answer OS/system specific?  If so, is there a way to find out 
which way it bound?

-----
  HWLOC_MEMBIND_BIND =          2,      /**< \brief Allocate memory on the 
given nodes.
                                         * \hideinitializer */
  HWLOC_MEMBIND_INTERLEAVE =    3,      /**< \brief Allocate memory on the 
given nodes in a round-robin manner.
                                         * \hideinitializer */
-----

What is the unit of distribution -- is it by page?  E.g., if I specify 4 numa 
nodes and allocate 10 pages, are they bound like this:

node A: 0, 4, 8
node B: 1, 5, 9
node C: 2, 6
node D: 3, 7

Or does it (more-or-less) equally distribute the pages across the 3 nodes, like 
this:

node A: 0, 1, 2
node B: 3, 4, 5
node C: 6, 7
node D: 8, 9

...or is the answer OS/system specific?  If so, is there a way to find out 
which way it bound?

-----
  HWLOC_MEMBIND_REPLICATE =     4,      /**< \brief Replicate memory on the 
given nodes.
                                         * \hideinitializer */
-----

Does this mean that if I allocate 10 pages worth of memory with 2 nodes 
specified, I'm actually allocating 2x that amount and duplicating it on both 
nodes?  I.e., is the memory bound like this:

node A: 0, 1, 2, ..., 9
node B: 0, 1, 2, ..., 9

and that a write to page 0 will physically write to *both* pages?  If so, 
what's the cost of the write?  Is it the time to write to all nodes, or the 
time to write to the first node that was specified?

What happens with reads?  Does the data come from the first node that was 
specified, and therefore the cost of a read is the cost of getting the data 
from the first node that was specified?

More specifically, what's the point of REPLICATE?  Is it solely for memory 
hardware fault tolerance (e.g., intel RAS)?  

What happens if the hardware/OS isn't capable of doing REPLICATE?  Will some 
kind of error be returned?

-----
  HWLOC_MEMBIND_NEXTTOUCH =     5       /**< \brief On next touch of existing 
allocated memory, migrate it to the node
                                         * where the memory reference happened.
                                         * \hideinitializer */
-----

What happens if the memory was not previously bound?

Same questions as above with FIRSTTOUCH -- is this on a page-by-page basis, or 
as an entire allocation?  E.g., if I allocate/bind 10 pages, then later set it 
to NEXTTOUCH, and then touch the 4th page, will the entire memory be moved to 
the numa node that is local to the thread where the touch occurred, or just the 
4th page?

Thanks!

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to