On Oct 21, 2009, at 13:42 , Scott Atchley wrote:
On Oct 21, 2009, at 1:25 PM, George Bosilca wrote:
Brice,
Because MX doesn't provide a real RMA protocol, we created a fake
one on top of point-to-point. The two peers have to agree on a
unique tag, then the receiver posts it before the sender starts the
send. However, as this is integrated with the real RMA protocol,
where only one side knows about the completion of the RMA
operation, we still exchange the ACK at the end. Therefore, the
receiver doesn't need to know when the receive is completed, as it
will get an ACK from the sender. At least this was the original idea.
But I can see how this might fails if the short ACK from the sender
manage to pass the RMA operation on the wire. I was under the
impression (based on the fact that MX respect the ordering) that
the mx_send will trigger the completion only when all data is on
the wire/nic memory so I supposed there is _absolutely_ no way for
the ACK to bypass the last RMA fragments and to reach the receiver
before the recv is really completed. If my supposition is not
correct, then we should remove the mx_forget and make sure the that
before we mark a fragment as completed we got both completions (the
one from mx_recv and the remote one).
George,
When is the ACK sent? After the "PUT" completion returns (via mx_test
(), etc) or simply after calling mx_isend() for the "PUT" but before
the completion?
The ACK is sent by the PML layer. If I'm not mistaken, it is sent when
the completion callback is triggered, which should happen only when
the MX BTL detect the completion of the mx_isend (using the mx_test).
Therefore, I think the ACK is sent in response to the completion of
the mx_isend.
george.
If the former, the ACK cannot pass the data. If the latter, it is
easily possible especially if there is a lot of contention (and thus
a lot of route dispersion).
MX only guarantees order of matching (two identical tags will match
in order), not order of completion.
Scott
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel