FWIW, George found what looks like a race condition in the sm init
code today -- it looks like we don't call maffinity anywhere in the sm
btl startup, so we're not actually guaranteed that the memory is local
to any particular process(or) (!). This race shouldn't cause segvs,
though; it should only mean that memory is potentially farther away
than we intended.
The central question is: does "first touch" mean both read and write?
I.e., is the first process that either reads *or* writes to a given
location considered "first touch"? Or is it only the first write?
On Mar 30, 2009, at 7:01 PM, Eugene Loh wrote:
Jeff Squyres wrote:
> On Mar 30, 2009, at 1:40 PM, Patrick Geoffray wrote:
>
>> > we will have to find a
>> > pretty smart way to do this or we will completely break the
memory
>> > affinity stuff.
>>
>> I didn't look at the code, but I sure hope that the SM init code
does
>> touch each page to force allocation, otherwise there is no memory
>> affinity stuff at all...
>
> Why not? The "owning" process can do the touch; then it'll be
> affinity'ed properly. Right?
So far as I can tell, the code has two mechanisms for memory
placement.
One is to create a different mpool for each affinity pool. The second
is to have the correct owner perform the first touch. (It's not clear
to me that the first mechanism is working, makes sense, is necessary,
etc. I just don't know.) Anyhow, we do indeed want proper first
touch
and the code seems to respect that.
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems