Jeff Squyres, le Tue 04 Jan 2011 21:57:56 +0100, a écrit : > Is it correct to assume that any hwloc_membind_flags_t flags can be or'ed > together except _THREAD and _PROCESS?
Yes, they really are flags (except _THREAD and _PROCESS which are exclusive of course). > By their values, it looks like policy flags cannot be OR'ed. Yes. > Here's all the policy flags: > > ----- > HWLOC_MEMBIND_DEFAULT = 0, /**< \brief Reset the memory allocation > policy to the system default. > * \hideinitializer */ > HWLOC_MEMBIND_FIRSTTOUCH = 1, /**< \brief Allocate memory on the > given nodes, but preferably on the > node where the first accessor is > running. > * \hideinitializer */ > ----- > > I'm not quite sure what "where the first accessor is running" means. Does > this mean that the intent is that the memory will be bound to the numa node > local to the first thread that touches the memory? Err, yes. Feel free to rephrase to anything that would be clearer. > If so, does this happen on a page-by-page basis, or as a whole allocation? page-by-page. > ----- > HWLOC_MEMBIND_BIND = 2, /**< \brief Allocate memory on > the given nodes. > * \hideinitializer */ > HWLOC_MEMBIND_INTERLEAVE = 3, /**< \brief Allocate memory on the > given nodes in a round-robin manner. > * \hideinitializer */ > ----- > > What is the unit of distribution -- is it by page? Mmm, OS documentations don't specify it, they usually only talk about "round-robin allocation", "interleaved allocation", "stripped allocation", or simply "accessed by many processors, thus distribute the memory". > If so, is there a way to find out which way it bound? We can try to benchmark memory accesses, but I don't think we should want to be too specific, because that'd mean adding yet more policies to choose and try for the programmer. We can however explain that it's useful when a given range of memory is accessed by many processors, and the memory access load should thus be distributed across nodes. > ----- > HWLOC_MEMBIND_REPLICATE = 4, /**< \brief Replicate memory on the > given nodes. > * \hideinitializer */ > ----- > > Does this mean that if I allocate 10 pages worth of memory with 2 nodes > specified, I'm actually allocating 2x that amount and duplicating it on both > nodes? Yes. > I.e., is the memory bound like this: > > node A: 0, 1, 2, ..., 9 > node B: 0, 1, 2, ..., 9 > > and that a write to page 0 will physically write to *both* pages? Actually, it's usually only supported for read-only data. > What happens with reads? Does the data come from the first node that was > specified, and therefore the cost of a read is the cost of getting the data > from the first node that was specified? Each thread accesses to its local NUMA node, that's precisely the point of replicating the data :) > More specifically, what's the point of REPLICATE? Is it solely for memory > hardware fault tolerance (e.g., intel RAS)? Not at all, it's really for performance reason. > What happens if the hardware/OS isn't capable of doing REPLICATE? Will some > kind of error be returned? ENOSYS, as usual (and there is also the support flag for it in the topology structure). Actually, at the moment only OSF supports it. > ----- > HWLOC_MEMBIND_NEXTTOUCH = 5 /**< \brief On next touch of existing > allocated memory, migrate it to the node > * where the memory reference happened. > * \hideinitializer */ > ----- > > What happens if the memory was not previously bound? It gets bound. > Same questions as above with FIRSTTOUCH -- is this on a page-by-page basis, > or as an entire allocation? Page-by-page. Thanks for your review, it's really useful to make sure that things which are obvious to me since I've written the code are properly documented :) Samuel