Jeff Squyres, le Tue 18 Jan 2011 20:00:42 +0100, a écrit : > On Jan 12, 2011, at 10:10 AM, Samuel Thibault wrote: > > This is not what I meant: hwloc_alloc_membind_policy's purpose is only > > to allocate bound memory. It happens that hwloc_alloc_membind_policy > > _may_ change the process policy in order to be able to bind memory > > at all (when the underlying OS does not have a directed allocation > > primitive), but that's not necessary. If hwloc can simply call a > > directed allocation primitive, it will do it. If the OS doesn't support > > binding at all, then hwloc will just allocate memory. > > How's this? > > * Setting this policy will cause the OS to try to bind a new memory > * allocation to the specified set.
Err, no, again hwloc_alloc_membind_policy's purpose is _not_ to set a policy for future allocations, but _only_ to allocate data. It just _happens_ to possibly have to change the current process policy in order to achieve the binding, but that's only a side effect. Think of it as "allocate bound memory, possibly changing the policy just for that". > As a side effect, some operating > * systems may change the current memory binding policy; It's not really the system that changes the current memory binding policy, it's hwloc which explicitly requests the operating to do so, in order to actually get the desired binding. I have rephrased it. > >> + HWLOC_MEMBIND_INTERLEAVE = 3, /**< \brief Allocate memory on > > > > This is not really correct: if the threads were splitting the memory > > amongst themselves, FIRSTTOUCH should be used instead, to migrate pages > > close to where they are referenced from. I have rephrased that > > What's a good simple example scenario when it would be good to use > INTERLEAVE, then? Well, this is what I have put instead: "Interleaving can be useful when threads distributed across the specified NUMA nodes will all be accessing the whole memory range concurrently, since the interleave will then balance the memory references." By "the whole", I really mean _all_ the threads will access the _whole_ range, without known separation, e.g. a coefficient vector that all threads need to read to perform some computation. Samuel