Jeff Squyres, le Tue 04 Jan 2011 21:57:56 +0100, a écrit :
> Is it correct to assume that any hwloc_membind_flags_t flags can be or'ed 
> together except _THREAD and _PROCESS?

Yes, they really are flags (except _THREAD and _PROCESS which are
exclusive of course).

> By their values, it looks like policy flags cannot be OR'ed.

Yes.

> Here's all the policy flags:
> 
> -----
>   HWLOC_MEMBIND_DEFAULT =     0,      /**< \brief Reset the memory allocation 
> policy to the system default.
>                                        * \hideinitializer */
>   HWLOC_MEMBIND_FIRSTTOUCH =  1,      /**< \brief Allocate memory on the 
> given nodes, but preferably on the
>                                         node where the first accessor is 
> running.
>                                        * \hideinitializer */
> -----
> 
> I'm not quite sure what "where the first accessor is running" means.  Does 
> this mean that the intent is that the memory will be bound to the numa node 
> local to the first thread that touches the memory?

Err, yes. Feel free to rephrase to anything that would be clearer.

> If so, does this happen on a page-by-page basis, or as a whole allocation?

page-by-page.

> -----
>   HWLOC_MEMBIND_BIND =                2,      /**< \brief Allocate memory on 
> the given nodes.
>                                        * \hideinitializer */
>   HWLOC_MEMBIND_INTERLEAVE =  3,      /**< \brief Allocate memory on the 
> given nodes in a round-robin manner.
>                                        * \hideinitializer */
> -----
> 
> What is the unit of distribution -- is it by page?

Mmm, OS documentations don't specify it, they usually only talk
about "round-robin allocation", "interleaved allocation", "stripped
allocation", or simply "accessed by many processors, thus distribute the
memory".

> If so, is there a way to find out which way it bound?

We can try to benchmark memory accesses, but I don't think we should
want to be too specific, because that'd mean adding yet more policies to
choose and try for the programmer. We can however explain that it's
useful when a given range of memory is accessed by many processors, and
the memory access load should thus be distributed across nodes.

> -----
>   HWLOC_MEMBIND_REPLICATE =   4,      /**< \brief Replicate memory on the 
> given nodes.
>                                        * \hideinitializer */
> -----
> 
> Does this mean that if I allocate 10 pages worth of memory with 2 nodes 
> specified, I'm actually allocating 2x that amount and duplicating it on both 
> nodes?

Yes.

> I.e., is the memory bound like this:
> 
> node A: 0, 1, 2, ..., 9
> node B: 0, 1, 2, ..., 9
> 
> and that a write to page 0 will physically write to *both* pages?

Actually, it's usually only supported for read-only data.

> What happens with reads?  Does the data come from the first node that was 
> specified, and therefore the cost of a read is the cost of getting the data 
> from the first node that was specified?

Each thread accesses to its local NUMA node, that's precisely the point
of replicating the data :)

> More specifically, what's the point of REPLICATE?  Is it solely for memory 
> hardware fault tolerance (e.g., intel RAS)?  

Not at all, it's really for performance reason.

> What happens if the hardware/OS isn't capable of doing REPLICATE?  Will some 
> kind of error be returned?

ENOSYS, as usual (and there is also the support flag for it in the
topology structure). Actually, at the moment only OSF supports it.

> -----
>   HWLOC_MEMBIND_NEXTTOUCH =   5       /**< \brief On next touch of existing 
> allocated memory, migrate it to the node
>                                        * where the memory reference happened.
>                                        * \hideinitializer */
> -----
> 
> What happens if the memory was not previously bound?

It gets bound.

> Same questions as above with FIRSTTOUCH -- is this on a page-by-page basis, 
> or as an entire allocation?

Page-by-page.

Thanks for your review, it's really useful to make sure that things
which are obvious to me since I've written the code are properly
documented :)

Samuel

Reply via email to