To expand slightly on Patrick's last comment:
> Cache prefetching is slightly
> more efficient on local socket, so closer to reader may be a bit better.
Ideally one polls from cache, but in the event that the line is evicted the
next poll after the eviction will pay a lower cost if the memory is near to
the reader.
-Paul
Patrick Geoffray wrote:
Richard Graham wrote:
Yes - it is polling volatile memory, so has to load from memory on
every read.
Actually, it will poll in cache, and only load from memory when the
cache coherency protocol invalidates the cache line. Volatile semantic
only prevents compiler optimizations.
It does not matter much where the pages are (closer to reader or
receiver) on NUMAs, as long as they are equally distributed among all
sockets (ie the choice is consistent). Cache prefetching is slightly
more efficient on local socket, so closer to reader may be a bit better.
Patrick
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Paul H. Hargrove phhargr...@lbl.gov
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900