Hi George, In the code I'm working on I actually have concurrent MPI_Waitany calls on the same set of requests. Oops. All clear now.
Cheers, Pedro > I checked the wait_any code, and I can only see one possible execution > path to return MPI_UNDEFINED. All requests have to be marked as > inactive, which only happens after the OMPI request completion > function is called. > > > This lead to the following question. Are your threads waiting on > common requests or each one of them only waits on a non-overlapping > subset? BTW, the MPI standard strictly forbids two concurrent > wait/test operations on the same request. > > > george. > > > On Dec 20, 2011, at 07:31 , Pedro Gonnet wrote: > > > > > > Hi again, > > > > I have a follow-up question. I have been using MPI_Init_thread and > > MPI_Isend/MPI_Irecv/MPI_Waitany for a while now and have stubled > over > > what may be a but in MPI_Waitany... > > > > Within a parallel region of the code (in this case I am using > OpenMP), > > calls to MPI_Isend and MPI_Irecv work find. If, however, I have > several > > threads calling MPI_Waitany at the same time, some of the calls > will > > return with an index MPI_UNDEFINED although there are still recvs > > waiting. > > > > In OpenMP, if I wrap the calls to MPI_Waitany in a "#pragma omp > > critical", everything works just fine. > > > > The reason I'm calling these functions in a parallel context is > that > > although MPI_Isend/MPI_Irecv are asynchronous, work (communication) > only > > seems to get done when I call MPI_Waitany. I therefore spawn > several > > threads which deal with the received data in turn, filling the > voids > > caused by communication. Oh, and all of this goes on while other > threads > > compute other things in the background. > > > > Could it be that there is a concurrency bug in MPI_Waitany? > > > > Cheers, > > Pedro > > > >> Sorry for the delay -- I just replied on the users list. I think > you > >> need to use MPI_INIT_THREAD with MPI_THREAD_MULTIPLE. See if that > >> helps. > >> > >> > >> On Oct 26, 2011, at 7:19 AM, Pedro Gonnet wrote: > >> > >> > >>> > >>> Hi all, > >>> > >>> I'm forwarding this message from the "users" mailing list as it > >> wasn't > >>> getting any attention there and I believe this is a bona-fide > bug. > >>> > >>> The issue is that if an MPI node has two threads, one exchanging > >> data > >>> with other nodes through the non-blocking routines, the other > >> exchanging > >>> data with MPI_Allreduce, the system hangs. > >>> > >>> The attached example program reproduces this bug. It can be > compiled > >> and > >>> run using the following: > >>> > >>> mpicc -g -Wall mpitest.c -pthread > >>> mpirun -np 8 xterm -e gdb -ex run ./a.out > >>> > >>> Note that you may need to fiddle with the delay in line 146 to > >> reproduce > >>> the problem. > >>> > >>> Many thanks, > >>> Pedro > >>> > >>> > >>> > >>> -------- Forwarded Message -------- > >>> From: Pedro Gonnet <gonnet_at_[hidden]> > >>> To: users <users_at_[hidden]> > >>> Subject: Re: Troubles using MPI_Isend/MPI_Irecv/MPI_Waitany and > >>> MPI_Allreduce > >>> Date: Sun, 23 Oct 2011 18:11:50 +0100 > >>> > >>> Hi again, > >>> > >>> As promised, I implemented a small program reproducing the error. > >>> > >>> The program's main routine spawns a pthread which calls the > >> function > >>> "exchange". "exchange" uses MPI_Isend/MPI_Irecv/MPI_Waitany to > >> exchange > >>> a buffer of double-precision numbers with all other nodes. > >>> > >>> At the same time, the "main" routine exchanges the sum of all the > >>> buffers using MPI_Allreduce. > >>> > >>> To compile and run the program, do the following: > >>> > >>> mpicc -g -Wall mpitest.c -pthread > >>> mpirun -np 8 ./a.out > >>> > >>> Timing is, of course, of the essence and you may have to run the > >> program > >>> a few times or twiddle with the value of "usleep" in line 146 for > it > >> to > >>> hang. To see where things go bad, you can do the following > >>> > >>> mpirun -np 8 xterm -e gdb -ex run ./a.out > >>> > >>> Things go bad when MPI_Allreduce is called while any of the > threads > >> are > >>> in MPI_Waitany. The value of "usleep" in line 146 should be long > >> enough > >>> for all the nodes to have started exchanging data but small > enough > >> so > >>> that they are not done yet. > >>> > >>> Cheers, > >>> Pedro > >>> > >>> > >>> > >>> On Thu, 2011-10-20 at 11:25 +0100, Pedro Gonnet wrote: > >>>> Short update: > >>>> > >>>> I just installed version 1.4.4 from source (compiled with > >>>> --enable-mpi-threads), and the problem persists. > >>>> > >>>> I should also point out that if, in thread (ii), I wait for the > >>>> nonblocking communication in thread (i) to finish, nothing bad > >> happens. > >>>> But this makes the nonblocking communication somewhat pointless. > >>>> > >>>> Cheers, > >>>> Pedro > >>>> > >>>> > >>>> On Thu, 2011-10-20 at 10:42 +0100, Pedro Gonnet wrote: > >>>>> Hi all, > >>>>> > >>>>> I am currently working on a multi-threaded hybrid parallel > >> simulation > >>>>> which uses both pthreads and OpenMPI. The simulation uses > several > >>>>> pthreads per MPI node. > >>>>> > >>>>> My code uses the nonblocking routines > >> MPI_Isend/MPI_Irecv/MPI_Waitany > >>>>> quite successfully to implement the node-to-node communication. > >> When I > >>>>> try to interleave other computations during this communication, > >> however, > >>>>> bad things happen. > >>>>> > >>>>> I have two MPI nodes with two threads each: one thread (i) > doing > >> the > >>>>> nonblocking communication and the other (ii) doing other > >> computations. > >>>>> At some point, the threads (ii) need to exchange data using > >>>>> MPI_Allreduce, which fails if the first thread (i) has not > >> completed all > >>>>> the communication, i.e. if thread (i) is still in MPI_Waitany. > >>>>> > >>>>> Using the in-place MPI_Allreduce, I get a re-run of this bug: > >>>>> http://www.open-mpi.org/community/lists/users/2011/09/17432.php. > >> If I > >>>>> don't use in-place, the call to MPI_Waitany (thread ii) on one > of > >> the > >>>>> MPI nodes waits forever. > >>>>> > >>>>> My guess is that when the thread (ii) calls MPI_Allreduce, it > >> gets > >>>>> whatever the other node sent with MPI_Isend to thread (i), > drops > >>>>> whatever it should have been getting from the other node's > >>>>> MPI_Allreduce, and the call to MPI_Waitall hangs. > >>>>> > >>>>> Is this a known issue? Is MPI_Allreduce not designed to work > >> alongside > >>>>> the nonblocking routines? Is there a "safe" variant of > >> MPI_Allreduce I > >>>>> should be using instead? > >>>>> > >>>>> I am using OpenMPI version 1.4.3 (version 1.4.3-1ubuntu3 of the > >> package > >>>>> openmpi-bin in Ubuntu). Both MPI nodes are run on the same > >> dual-core > >>>>> computer (Lenovo x201 laptop). > >>>>> > >>>>> If you need more information, please do let me know! I'll also > try > >> to > >>>>> cook-up a small program reproducing this problem... > >>>>> > >>>>> Cheers and kind regards, > >>>>> Pedro > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >>> > >>> <mpitest.c>_______________________________________________ > >>> devel mailing list > >>> devel_at_[hidden] > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> > >> -- > >> Jeff Squyres > >> jsquyres_at_[hidden] > >> For corporate legal information go to: > >> http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > _______________________________________________ > > devel mailing list > > devel_at_[hidden] > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > >