Gilles,
I looked at the test and I think the current behavior is indeed
correct. What matters for an exclusive lock is that all operations in
an epoch (everything surrounded by lock/unlock) are atomically applied
to the destination (and are not interleaved with other updates). As
Nathan stated, MPI_Win_lock might be implemented as non-blocking, in
which case it is totally legit for the process 2 to acquire the lock
first, and update the array before the process 0 access it. Thus the
test will fail.
The test will never deadlock, because even if the MPI_Win_lock is
implemented as a blocking operation (which is also legit), the send
and receive match correctly with the lock/unlock.
Moreover, I think the behavior described by the comments can only be
implemented by enforcing an order between the only conceptually
meaningful operations unlock/send/recv.
if (me == 0) {
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, win);
MPI_Get(a,len,MPI_DOUBLE,1,0,len,MPI_DOUBLE,win);
MPI_Win_unlock(1, win);
MPI_Send(NULL, 0, MPI_BYTE, 2, 1001, MPI_COMM_WORLD);
}
if (me == 2) {
/* this should block till 0 releases the lock. */
MPI_Recv(NULL, 0, MPI_BYTE, 0, 1001, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, win);
MPI_Put(a,len,MPI_DOUBLE,1,0,len,MPI_DOUBLE,win);
MPI_Win_unlock(1, win);
}
However, if we relax the code a little and want to ensure the
atomicity of the operations, then we need to change the check to make
sure that either no elements of the array have been altered or all of
them have been altered (set to zero by process 2).
George.
On Mon, Nov 21, 2016 at 8:57 PM, Nathan Hjelm <hje...@me.com
<mailto:hje...@me.com>> wrote:
To be safe I would call MPI_Get then MPI_Win_flush. That lock will
always be acquired before the MPI_Win_flush call returns. As long
as it is more than 0 bytes. We always short-circuit 0-byte
operations in both osc/rdma and osc/pt2pt.
-Nathan
> On Nov 21, 2016, at 8:54 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
>
> Thanks Nathan,
>
>
> any thoughts about my modified version of the test ?
>
> do i need to MPI_Win_flush() after the first MPI_Get() in order
to ensure the lock was acquired ?
>
> (and hence the program will either success or hang, but never fail)
>
>
> Cheers,
>
>
> Gilles
>
>
> On 11/22/2016 12:29 PM, Nathan Hjelm wrote:
>> MPI_Win_lock does not have to be blocking. In osc/rdma it is
blocking in most cases but not others (lock all with on-demand is
non-blocking) but in osc/pt2pt is is almost always non-blocking
(it has to be blocking for proc self). If you really want to
ensure the lock is acquired you can call MPI_Win_flush. I think
this should work even if you have not started any RMA operations
inside the epoch.
>>
>> -Nathan
>>
>>> On Nov 21, 2016, at 7:53 PM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
>>>
>>> Nathan,
>>>
>>>
>>> we briefly discussed the test_lock1 test from the onesided
test suite using osc/pt2pt
>>>
>>>
https://github.com/open-mpi/ompi-tests/blob/master/onesided/test_lock1.c#L57-L70
<https://github.com/open-mpi/ompi-tests/blob/master/onesided/test_lock1.c#L57-L70>
>>>
>>>
>>> task 0 does
>>>
>>> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);
>>>
>>> MPI_Send(...,dest=2,...)
>>>
>>>
>>> and task 2 does
>>>
>>> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);
>>>
>>> MPI_Recv(...,source=0,...)
>>>
>>>
>>> hoping to guarantee task 0 will acquire the lock first.
>>>
>>>
>>> once in a while, the test fails when task 2 acquires the lock
first
>>>
>>> /* MPI_Win_lock() only sends a lock request, and return
without owning the lock */
>>>
>>> so if task 1 is running on a loaded server, and even if task 2
requests the lock *after* task 0,
>>>
>>> lock request from task 2 can be processed first, and hence
task 2 is not guaranteed to acquire the lock *before* task 0.
>>>
>>>
>>> can you please confirm MPI_Win_lock() behaves as it is
supposed to ?
>>>
>>> if yes, is there a way for task 0 to block until it acquires
the lock ?
>>>
>>>
>>> i modified the test, and inserted in task 0 a MPI_Get of 1
MPI_Double *before* MPI_Send.
>>>
>>> see my patch below (note i increased the message length)
>>>
>>>
>>> my expectation is that the test would either success (e.g.
task 0 gets the lock first) or hang
>>>
>>> (if task 1 gets the lock first)
>>>
>>>
>>>
>>> surprisingly, the test never hangs (so far ...) but once in a
while, it fails (!), which makes me very confused
>>>
>>>
>>> Any thoughts ?
>>>
>>>
>>> Cheers,
>>>
>>>
>>> Gilles
>>>
>>>
>>>
>>> diff --git a/onesided/test_lock1.c b/onesided/test_lock1.c
>>> index c549093..9fa3f8d 100644
>>> --- a/onesided/test_lock1.c
>>> +++ b/onesided/test_lock1.c
>>> @@ -20,7 +20,7 @@ int
>>> test_lock1(void)
>>> {
>>> double *a = NULL;
>>> - size_t len = 10;
>>> + size_t len = 1000000;
>>> MPI_Win win;
>>> int i;
>>>
>>> @@ -56,6 +56,7 @@ test_lock1(void)
>>> */
>>> if (me == 0) {
>>> MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, win);
>>> + MPI_Get(a,1,MPI_DOUBLE,1,0,1,MPI_DOUBLE,win);
>>> MPI_Send(NULL, 0, MPI_BYTE, 2, 1001, MPI_COMM_WORLD);
>>> MPI_Get(a,len,MPI_DOUBLE,1,0,len,MPI_DOUBLE,win);
>>> MPI_Win_unlock(1, win);
>>> @@ -76,6 +77,7 @@ test_lock1(void)
>>> /* make sure 0 got the data from 1 */
>>> for (i = 0; i < len; i++) {
>>> if (a[i] != (double)(10*1+i)) {
>>> + if (0 == nfail) fprintf(stderr, "at index %d,
expected %lf but got %lf\n", i, (double)10*1+i, a[i]);
>>> nfail++;
>>> }
>>> }
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>> _______________________________________________
>> devel mailing list
>> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
>>
>
> _______________________________________________
> devel mailing list
> devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
> https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org <mailto:devel@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel
<https://rfd.newmexicoconsortium.org/mailman/listinfo/devel>
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel