Hi Gilles, Nathan, I read the MPI standard but I think the standard doesn't require a barrier in the test program.
>From the standards (11.5.1 Fence) : A fence call usually entails a barrier synchronization: a process completes a call to MPI_WIN_FENCE only after all other processes in the group entered their matching call. However, a call to MPI_WIN_FENCE that is known not to end any epoch (in particular, a call with assert equal to MPI_MODE_NOPRECEDE) does not necessarily act as a barrier. This sentence is misleading. In the non-MPI_MODE_NOPRECEDE case, a barrier is necessary in the MPI implementation to end access/exposure epochs. In the MPI_MODE_NOPRECEDE case, a barrier is not necessary in the MPI implementation to end access/exposure epochs. Also, a *global* barrier is not necessary in the MPI implementation to start access/exposure epochs. But some synchronizations are still needed to start an exposure epoch. For example, let's assume all ranks call MPI_WIN_FENCE(MPI_MODE_NOPRECEDE) and then rank 0 calls MPI_PUT to rank 1. In this case, rank 0 can access the window on rank 1 before rank 2 or others call MPI_WIN_FENCE. (But rank 0 must wait rank 1's MPI_WIN_FENCE.) I think this is the intent of the sentence in the MPI standard cited above. Thanks, Takahiro Kawashima > Hi Rolf, > > yes, same issue ... > > i attached a patch to the github issue ( the issue might be in the test). > > From the standards (11.5 Synchronization Calls) : > "TheMPI_WIN_FENCE collective synchronization call supports a simple > synchroniza- > tion pattern that is often used in parallel computations: namely a > loosely-synchronous > model, where global computation phases alternate with global > communication phases." > > as far as i understand (disclaimer, i am *not* good at reading standards > ...) this is not > necessarily an MPI_Barrier, so there is a race condition in the test > case that can be avoided > by adding an MPI_Barrier after initializing RecvBuff. > > could someone (Jeff ? George ?) please double check this before i push a > fix into ompi-tests repo ? > > Cheers, > > Gilles > > On 4/20/2015 10:19 PM, Rolf vandeVaart wrote: > > > > Hi Gilles: > > > > Is your failure similar to this ticket? > > > > https://github.com/open-mpi/ompi/issues/393 > > > > Rolf > > > > *From:*devel [mailto:devel-boun...@open-mpi.org] *On Behalf Of *Gilles > > Gouaillardet > > *Sent:* Monday, April 20, 2015 9:12 AM > > *To:* Open MPI Developers > > *Subject:* [OMPI devel] c_accumulate > > > > Folks, > > > > i (sometimes) get some failure with the c_accumulate test from the ibm > > test suite on one host with 4 mpi tasks > > > > so far, i was only able to observe this on linux/sparc with the vader btl > > > > here is a snippet of the test : > > > > MPI_Win_create(&RecvBuff, sizeOfInt, 1, MPI_INFO_NULL, > > MPI_COMM_WORLD, &Win); > > > > SendBuff = rank + 100; > > RecvBuff = 0; > > > > /* Accumulate to everyone, just for the heck of it */ > > > > MPI_Win_fence(MPI_MODE_NOPRECEDE, Win); > > for (i = 0; i < size; ++i) > > MPI_Accumulate(&SendBuff, 1, MPI_INT, i, 0, 1, MPI_INT, MPI_SUM, Win); > > MPI_Win_fence((MPI_MODE_NOPUT | MPI_MODE_NOSUCCEED), Win); > > > > when the test fails, RecvBuff in (rank+100) instead of the accumulated > > value (100 * nprocs + (nprocs -1)*nprocs/2 > > > > i am not familiar with onesided operations nor MPI_Win_fence. > > > > that being said, i found suspicious RecvBuff is initialized *after* > > MPI_Win_create ... > > > > does MPI_Win_fence implies MPI_Barrier ? > > > > if not, i guess RecvBuff should be initialized *before* MPI_Win_create. > > > > makes sense ? > > > > (and if it does make sense, then this issue is not related to sparc, > > and vader is not the root cause) > > > > Cheers, > > > > Gilles