FWIW, George found what looks like a race condition in the sm init code today -- it looks like we don't call maffinity anywhere in the sm btl startup, so we're not actually guaranteed that the memory is local to any particular process(or) (!). This race shouldn't cause segvs, though; it should only mean that memory is potentially farther away than we intended.

The central question is: does "first touch" mean both read and write? I.e., is the first process that either reads *or* writes to a given location considered "first touch"? Or is it only the first write?


On Mar 30, 2009, at 7:01 PM, Eugene Loh wrote:

Jeff Squyres wrote:

> On Mar 30, 2009, at 1:40 PM, Patrick Geoffray wrote:
>
>> > we will have to  find a
>> > pretty smart way to do this or we will completely break the memory
>> > affinity stuff.
>>
>> I didn't look at the code, but I sure hope that the SM init code does
>> touch each page to force allocation, otherwise there is no memory
>> affinity stuff at all...
>
> Why not?  The "owning" process can do the touch; then it'll be
> affinity'ed properly.  Right?

So far as I can tell, the code has two mechanisms for memory placement.
One is to create a different mpool for each affinity pool.  The second
is to have the correct owner perform the first touch.  (It's not clear
to me that the first mechanism is working, makes sense, is necessary,
etc. I just don't know.) Anyhow, we do indeed want proper first touch
and the code seems to respect that.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

Reply via email to