Nathan,

we briefly discussed the test_lock1 test from the onesided test suite using osc/pt2pt

https://github.com/open-mpi/ompi-tests/blob/master/onesided/test_lock1.c#L57-L70


task 0 does

MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);

MPI_Send(...,dest=2,...)


and task 2 does

MPI_Win_lock(MPI_LOCK_EXCLUSIVE, rank=1,...);

MPI_Recv(...,source=0,...)


hoping to guarantee task 0 will acquire the lock first.


once in a while, the test fails when task 2 acquires the lock first

/* MPI_Win_lock() only sends a lock request, and return without owning the lock */

so if task 1 is running on a loaded server, and even if task 2 requests the lock *after* task 0,

lock request from task 2 can be processed first, and hence task 2 is not guaranteed to acquire the lock *before* task 0.


can you please confirm MPI_Win_lock() behaves as it is supposed to ?

if yes, is there a way for task 0 to block until it acquires the lock ?


i modified the test, and inserted in task 0 a MPI_Get of 1 MPI_Double *before* MPI_Send.

see my patch below (note i increased the message length)


my expectation is that the test would either success (e.g. task 0 gets the lock first) or hang

(if task 1 gets the lock first)



surprisingly, the test never hangs (so far ...) but once in a while, it fails (!), which makes me very confused


Any thoughts ?


Cheers,


Gilles



diff --git a/onesided/test_lock1.c b/onesided/test_lock1.c
index c549093..9fa3f8d 100644
--- a/onesided/test_lock1.c
+++ b/onesided/test_lock1.c
@@ -20,7 +20,7 @@ int
 test_lock1(void)
 {
     double *a = NULL;
-    size_t     len = 10;
+    size_t     len = 1000000;
     MPI_Win    win;
     int        i;

@@ -56,6 +56,7 @@ test_lock1(void)
      */
     if (me == 0) {
        MPI_Win_lock(MPI_LOCK_EXCLUSIVE, 1, 0, win);
+       MPI_Get(a,1,MPI_DOUBLE,1,0,1,MPI_DOUBLE,win);
         MPI_Send(NULL, 0, MPI_BYTE, 2, 1001, MPI_COMM_WORLD);
        MPI_Get(a,len,MPI_DOUBLE,1,0,len,MPI_DOUBLE,win);
         MPI_Win_unlock(1, win);
@@ -76,6 +77,7 @@ test_lock1(void)
         /* make sure 0 got the data from 1 */
        for (i = 0; i < len; i++) {
            if (a[i] != (double)(10*1+i)) {
+ if (0 == nfail) fprintf(stderr, "at index %d, expected %lf but got %lf\n", i, (double)10*1+i, a[i]);
                nfail++;
            }
        }

_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/devel

Reply via email to