Okay, this should finally be fixed. See the commit message for r23045 for an 
explanation.

It really wasn't anything in the cited changeset that caused the problem. The 
root cause is that $#@$ abort file we dropped in the session dir to indicate 
you called MPI_Abort vs trying to thoroughly cleanup. Been biting us in the 
butt for years - finally removed it.


On Apr 26, 2010, at 12:58 PM, Rolf vandeVaart wrote:

> The ibm/final test does not call MPI_Abort directly.  It is calling 
> MPI_Barrier after MPI_Finalize is called, which is a no-no.  This is detected 
> and eventually the library calls ompi_mpi_abort().  This is very similar to 
> MPI_Abort() which ultimately calls ompi_mpi_abort as well.  So, I guess I am 
> saying for all intents and purposes, it calls MPI_Abort.
> 
> Rolf
> 
> On 04/26/10 14:41, Ralph Castain wrote:
>> 
>> I'll try to keep it in mind as I continue the errmgr work. I gather these 
>> tests all call MPI_Abort?
>> 
>> 
>> On Apr 26, 2010, at 12:31 PM, Rolf vandeVaart wrote:
>> 
>>   
>>> With our MTT testing we have noticed a problem that has cropped up in the 
>>> trunk.  There are some tests that are supposed to return a non-zero status 
>>> because they are getting errors, but are instead returning 0.  This problem 
>>> does not exist in r23022 but does exist in r23023.
>>> 
>>> One can use the ibm/final test to reproduce the problem.  An example of a 
>>> passing case followed by a failing case is shown below.
>>> 
>>> Ralph, you want me to open a ticket on this?  Or do you just want to take a 
>>> look.  I am asking you since you did the r23023 commit.
>>> 
>>> Rolf
>>> 
>>> 
>>> TRUNK VERSION r23022:
>>> [rolfv@burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final
>>> **************************************************************************
>>> This test should generate a message about MPI is either not initialized or
>>> has already been finialized.
>>> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
>>> **************************************************************************
>>> *** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
>>> *** This is disallowed by the MPI standard.
>>> *** Your MPI job will now abort.
>>> [burl-ct-x2200-6:6072] Abort after MPI_FINALIZE completed successfully; not 
>>> able to guarantee that all other processes were killed!
>>> --------------------------------------------------------------------------
>>> mpirun noticed that the job aborted, but has no info as to the process
>>> that caused that situation.
>>> --------------------------------------------------------------------------
>>> [rolfv@burl-ct-x2200-6 environment]$ echo $status
>>> 1
>>> [rolfv@burl-ct-x2200-6 environment]$
>>> 
>>> 
>>> TRUNK VERSION r23023:
>>> [rolfv@burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final
>>> **************************************************************************
>>> This test should generate a message about MPI is either not initialized or
>>> has already been finialized.
>>> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
>>> **************************************************************************
>>> *** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
>>> *** This is disallowed by the MPI standard.
>>> *** Your MPI job will now abort.
>>> [burl-ct-x2200-6:4089] Abort after MPI_FINALIZE completed successfully; not 
>>> able to guarantee that all other processes were killed!
>>> [rolfv@burl-ct-x2200-6 environment]$ echo $status
>>> 0
>>> [rolfv@burl-ct-x2200-6 environment]$
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>     
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>   
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to