Jason,

I just got an email saying that Mellanox does infact use an ordering
for reads and writes. So I think we can blame the CPU or the PCI for
the unordered reads.

On Thu, Nov 14, 2013 at 3:05 PM, Jason Gunthorpe
<[email protected]> wrote:
> On Thu, Nov 14, 2013 at 01:12:55AM -0400, Anuj Kalia wrote:
>
>> So, another question: why are the reads unordered while the writes are
>> ordered? I think by now we can assume write ordering (my experiments +
>> MVAPICH uses it). Can the PCI reorder the reads issued by the HCA?
>
> Without fencing there is no gurantee in what order things are made
> visible, and the CPU will flush its write buffers however it likes.
I'm using fencing in the read experiment. The code at the server looks
like this:

while(1) {
        for(i = 0; i < EXTENT_CAPACITY; i++) {
            ptr[EXTENT_CAPACITY - i - 1] = iter;
            asm volatile ("" : : : "memory");
            asm volatile("mfence" ::: "memory");
        }
        iter ++;
        usleep(2000 + (rand() % 200));
    }

> The PCI subsystem can also re-order reads however it likes, that is
> part of the PCI spec. In a 2 socket system don't be surprised if cache
> lines on different sockets complete out of order.
> Think of this as a classic multi-threaded race condition, and not
> related to PCI. If you do the same test using 2 threads you probably
> get the same results.
>
The PCI explanation sounds good.
However, with a fence after every update, I don't think multiple
sockets will be a problem.
>> > Intel hardware is very good at hiding ordering issues 99% of the time,
>> > but in many cases there can be a stress'd condition that will show a
>> > different result.
>
>> Hmm.. I'm willing to run billions of iterations of the test. That
>> should give some confidence.
>
> Not really, repeating the same test billions of times is not
> comprehensive.  You need to stress the system in all sorts of
> different ways to see different behavior.
Hmm.. It's not really the same test. My server sleeps for a randomly
chosen large duration between updates. If the test passes for many
iterations, we can assume that we've tested a lot of interleavings.
But yes, that doesn't give 100% confidence.
> For instance, in a 2 socket system there are likely all sorts of crazy
> sensitivities that depend on which socket the memory lives, which
> socket holds the newest cacheline, which socket has an old line, which
> socket is connected directly to the HCA, etc.
Again, does that matter with fences? With a fence after every update,
there is a real time ordering for when the updates appear in the cache
hierarchy regardless of the socket.
>
> Jason

Regards,
Anuj
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to