Re: [OMPI devel] Question about hanging mpirun

Jeff Squyres Thu, 7 Jul 2011 20:35:52 -0400

On Jul 5, 2011, at 2:21 PM, Ralph Castain wrote:

>> Ok I think I figured out what the deadlock in my application was... and 
>> please confirm if this makes sense:
>> 1. There was an 'if' condition that was met, causing 2 (out of 3) of my 
>> processes to call MPI_finalize(). 
>> 2. The remaining process was still trying to run and at some point was 
>> requesting calls like MPI_receive(), MPI_send() and MPI_wait() while the 
>> other two processes were at MPI_finalize() (althought they would never 
>> exit).The application would hang at that point, but the program was too big 
>> for me to figure out where exactly the lonely running process would hang. 
>> 3. I am no expert on openmpi, so I would appreciate it if someone can 
>> confirm if this was an expected behavior. I addressed the condition and now 
>> all processes run their course.
> 
> That is correct behavior for MPI - i.e., if one process is rattling off MPI 
> requests while the others have already entered finalize, then the job will 
> hang since the requests cannot possibly be met and that proc never calls 
> finalize to release completion of the job.


One clarification on this point...

If process A calls MPI_Send to process B and that send completes before B 
actually receives the message (e.g., if the message was small and there were no 
other messages pending between A and B), and then A calls MPI_Finalize, then B 
can still legally call MPI_Recv to receive the outstanding message.  That 
scenario should work fine.

What doesn't work is if you initiate new communication to a process that has 
called MPI_Finalize -- e.g., if you MPI_Send to a finalized process, or you try 
to MPI_Recv a message that wasn't send before the peer finalized.

Make sense?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] Question about hanging mpirun

Reply via email to