On Jul 5, 2011, at 2:21 PM, Ralph Castain wrote: >> Ok I think I figured out what the deadlock in my application was... and >> please confirm if this makes sense: >> 1. There was an 'if' condition that was met, causing 2 (out of 3) of my >> processes to call MPI_finalize(). >> 2. The remaining process was still trying to run and at some point was >> requesting calls like MPI_receive(), MPI_send() and MPI_wait() while the >> other two processes were at MPI_finalize() (althought they would never >> exit).The application would hang at that point, but the program was too big >> for me to figure out where exactly the lonely running process would hang. >> 3. I am no expert on openmpi, so I would appreciate it if someone can >> confirm if this was an expected behavior. I addressed the condition and now >> all processes run their course. > > That is correct behavior for MPI - i.e., if one process is rattling off MPI > requests while the others have already entered finalize, then the job will > hang since the requests cannot possibly be met and that proc never calls > finalize to release completion of the job.
One clarification on this point... If process A calls MPI_Send to process B and that send completes before B actually receives the message (e.g., if the message was small and there were no other messages pending between A and B), and then A calls MPI_Finalize, then B can still legally call MPI_Recv to receive the outstanding message. That scenario should work fine. What doesn't work is if you initiate new communication to a process that has called MPI_Finalize -- e.g., if you MPI_Send to a finalized process, or you try to MPI_Recv a message that wasn't send before the peer finalized. Make sense? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/