Jason,

Thanks again :).

I found another similar thread:
http://www.spinics.net/lists/linux-rdma/msg02709.html. The conclusion
there was that although Infiniband specs don't specify any ordering of
writes, many people assume left-to-right ordering anyway. There is no
mention of reads though.

So I did the micro experiments and I found that although writes follow
the left-right ordering, reads do not. More details follow:

1. Write ordering experiment:
1.a. In the nth iteration, a client writes a buffer containing C ~
1024 integers (each equal to 'n') to the server. The client sleeps for
2000 us between iterations.
1.b. The server busily polls for a change to the Cth integer. When the
Cth integer changes from i to i+1, it checks if the entire buffer is
equal to i+1. The check always passes (I've tried over 15 million
checks). The test fails if the polled integer is not the rightmost
integer.

2. Read ordering experiment:
2.a. In the nth iteration, the server writes 'n' to C ~ 1024 integers
in a local buffer. The server does the write in reverse order
(starting from index C-1). It then sleeps for 2000 us.
2.b. The client continuously reads the buffer. When the Cth integer in
the read sink changes from i to i+1, it checks if all the integers in
the buffer are i+1. This check fails (although rarely).

This shows that reads are NOT ordered left to right. The read pattern
that I'd expect is HHHH...HHHH (where H corresponds to i+1). However,
I can see patterns like HH..LLLLL...HH (L corresponds to i). This is
wrong because we don't expect i's to be lingering around after the
first integer has become i+1 (under the false assumption that reads
happen left-to-right).

Curiously, whenever there are stale i's, they are always such that the
contiguous chunk of i's would fit inside a cacheline. I'm seeing 16
i's and 48 i's usually.
2.c. The check always succeeds if C is 16 (the buffer fits inside a
cacheline). I've done 15 million checks, will do much more tonight.

So, another question: why are the reads unordered while the writes are
ordered? I think by now we can assume write ordering (my experiments +
MVAPICH uses it). Can the PCI reorder the reads issued by the HCA?

On Wed, Nov 13, 2013 at 2:09 PM, Jason Gunthorpe
<[email protected]> wrote:
> On Wed, Nov 13, 2013 at 02:55:53AM -0400, Anuj Kalia wrote:
>
>> I don't know what you meant by burst writes: do you mean several RDMA
>> writes or one large write? I'm concered with the order in which data
>
> A RDMA write will be split up by the HCA into a burst of PCI MemoryWr
> operations.
>
>> I guess now is the time I run lots of micro experiments. Thanks a lot
>> for the help everyone.
>
> Carefull, experiments can't prove that order is guranteed to be
> present, they can only show if it certainly isn't.
Aah, unfortunately that's true. However, I ran experiments anyway. If
people have been assuming an ordering on writes, I guess I can check
if reads are ordered too.
> Intel hardware is very good at hiding ordering issues 99% of the time,
> but in many cases there can be a stress'd condition that will show a
> different result.
Hmm.. I'm willing to run billions of iterations of the test. That
should give some confidence.
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to