I should do the experiment with 2 processes however..
On Thu, Nov 14, 2013 at 3:33 PM, Anuj Kalia <[email protected]> wrote: > Jason, > > I just got an email saying that Mellanox does infact use an ordering > for reads and writes. So I think we can blame the CPU or the PCI for > the unordered reads. > > On Thu, Nov 14, 2013 at 3:05 PM, Jason Gunthorpe > <[email protected]> wrote: >> On Thu, Nov 14, 2013 at 01:12:55AM -0400, Anuj Kalia wrote: >> >>> So, another question: why are the reads unordered while the writes are >>> ordered? I think by now we can assume write ordering (my experiments + >>> MVAPICH uses it). Can the PCI reorder the reads issued by the HCA? >> >> Without fencing there is no gurantee in what order things are made >> visible, and the CPU will flush its write buffers however it likes. > I'm using fencing in the read experiment. The code at the server looks > like this: > > while(1) { > for(i = 0; i < EXTENT_CAPACITY; i++) { > ptr[EXTENT_CAPACITY - i - 1] = iter; > asm volatile ("" : : : "memory"); > asm volatile("mfence" ::: "memory"); > } > iter ++; > usleep(2000 + (rand() % 200)); > } > >> The PCI subsystem can also re-order reads however it likes, that is >> part of the PCI spec. In a 2 socket system don't be surprised if cache >> lines on different sockets complete out of order. >> Think of this as a classic multi-threaded race condition, and not >> related to PCI. If you do the same test using 2 threads you probably >> get the same results. >> > The PCI explanation sounds good. > However, with a fence after every update, I don't think multiple > sockets will be a problem. >>> > Intel hardware is very good at hiding ordering issues 99% of the time, >>> > but in many cases there can be a stress'd condition that will show a >>> > different result. >> >>> Hmm.. I'm willing to run billions of iterations of the test. That >>> should give some confidence. >> >> Not really, repeating the same test billions of times is not >> comprehensive. You need to stress the system in all sorts of >> different ways to see different behavior. > Hmm.. It's not really the same test. My server sleeps for a randomly > chosen large duration between updates. If the test passes for many > iterations, we can assume that we've tested a lot of interleavings. > But yes, that doesn't give 100% confidence. >> For instance, in a 2 socket system there are likely all sorts of crazy >> sensitivities that depend on which socket the memory lives, which >> socket holds the newest cacheline, which socket has an old line, which >> socket is connected directly to the HCA, etc. > Again, does that matter with fences? With a fence after every update, > there is a real time ordering for when the updates appear in the cache > hierarchy regardless of the socket. >> >> Jason > > Regards, > Anuj -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
