hi, I'm reading in open mpi 2.2 standards and my eye fell onto something amazing.
http://www.mpi-forum.org/docs/mpi-2.2/mpi22-report.pdf chapter 11 "one-sided communications" page 339: "it is erroneous to have concurrent conflicting accesses to the same memory location in a window" Does this mean that each update, either read or write in itself is atomic with infiniband? In computerchess it can happen we simply write and read to the same locations. This can result of course in garbled data. Most don't care, some like me store a CRC and care even less. Odds is relative small it happens, but it happens. About once each 200 billion operations there is an atomic coincidence that 2 writes happen to the same location i measured (at Origin3800 @ 200 cpu's @ 120GB ram), resulting in garbage written at that specific cacheline, or 2 consecutive cachelines sharing 20 bytes of data (obviously usually this last case happens - at PC hardware actually only the last case can occur and entries garbled within 1 cacheline). Now the actual reads are a byte or 160, from which only 20 bytes will get used, so the statistical odds is a lot larger than this 1 in 200 billion that it occurs that overlapping parts of RAM get requested by 2 or more cores at the same time, randomly somewhere at the cluster and/or writes of 20 bytes that fall within that range. What's actually happening in hardware here? As it says further: "if a location is updated by a put or accumulate operation, then this location cannot be accessed by a load or another RMA operation until the updating operation has completed." Well it's gonna happen, not much, but sometimes. Of course i don't care if there is some slowdown in that once in a billion time that 2 or more cores write/read at the same memory within the window, but i do care when normal operations get slowed down by this spec as given in MPI 2.2 :) If remote cores ask/write RAM (which usually are different non overlapping RMA requests from the RAM) by put/get a random 20-160 bytes scathered through say a gigabyte of RAM of the receiving node, can the receiving node then issue those say half a dozen random lookups/writes to the RAM buffer of a gigabyte in a concurrent manner? _______________________________________________ Beowulf mailing list, [email protected] sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
