I didn't runt that specific test, but I did run a test that calls MPI_Abort. I 
found a bug this morning, though (reported by Sam) that was causing the state 
of remote procs to be incorrectly reported.

Try with r23048 or higher.

On Apr 27, 2010, at 9:15 AM, Rolf vandeVaart wrote:

> Ralph, did you get a chance to run the ibm/final test to see if these changes 
> fixed the problem?  I just rebuilt the trunk and tried it and I still get an 
> exit status of 0 back.  I will run it again to make sure I have not made a 
> mistake.
> 
> Rolf
> 
> On 04/26/10 23:43, Ralph Castain wrote:
>> 
>> Okay, this should finally be fixed. See the commit message for r23045 for an 
>> explanation.
>> 
>> It really wasn't anything in the cited changeset that caused the problem. 
>> The root cause is that $#@$ abort file we dropped in the session dir to 
>> indicate you called MPI_Abort vs trying to thoroughly cleanup. Been biting 
>> us in the butt for years - finally removed it.
>> 
>> 
>> On Apr 26, 2010, at 12:58 PM, Rolf vandeVaart wrote:
>> 
>>> The ibm/final test does not call MPI_Abort directly.  It is calling 
>>> MPI_Barrier after MPI_Finalize is called, which is a no-no.  This is 
>>> detected and eventually the library calls ompi_mpi_abort().  This is very 
>>> similar to MPI_Abort() which ultimately calls ompi_mpi_abort as well.  So, 
>>> I guess I am saying for all intents and purposes, it calls MPI_Abort.
>>> 
>>> Rolf
>>> 
>>> On 04/26/10 14:41, Ralph Castain wrote:
>>>> 
>>>> I'll try to keep it in mind as I continue the errmgr work. I gather these 
>>>> tests all call MPI_Abort?
>>>> 
>>>> 
>>>> On Apr 26, 2010, at 12:31 PM, Rolf vandeVaart wrote:
>>>> 
>>>>   
>>>>> With our MTT testing we have noticed a problem that has cropped up in the 
>>>>> trunk.  There are some tests that are supposed to return a non-zero 
>>>>> status because they are getting errors, but are instead returning 0.  
>>>>> This problem does not exist in r23022 but does exist in r23023.
>>>>> 
>>>>> One can use the ibm/final test to reproduce the problem.  An example of a 
>>>>> passing case followed by a failing case is shown below.
>>>>> 
>>>>> Ralph, you want me to open a ticket on this?  Or do you just want to take 
>>>>> a look.  I am asking you since you did the r23023 commit.
>>>>> 
>>>>> Rolf
>>>>> 
>>>>> 
>>>>> TRUNK VERSION r23022:
>>>>> [rolfv@burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final
>>>>> **************************************************************************
>>>>> This test should generate a message about MPI is either not initialized or
>>>>> has already been finialized.
>>>>> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
>>>>> **************************************************************************
>>>>> *** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
>>>>> *** This is disallowed by the MPI standard.
>>>>> *** Your MPI job will now abort.
>>>>> [burl-ct-x2200-6:6072] Abort after MPI_FINALIZE completed successfully; 
>>>>> not able to guarantee that all other processes were killed!
>>>>> --------------------------------------------------------------------------
>>>>> mpirun noticed that the job aborted, but has no info as to the process
>>>>> that caused that situation.
>>>>> --------------------------------------------------------------------------
>>>>> [rolfv@burl-ct-x2200-6 environment]$ echo $status
>>>>> 1
>>>>> [rolfv@burl-ct-x2200-6 environment]$
>>>>> 
>>>>> 
>>>>> TRUNK VERSION r23023:
>>>>> [rolfv@burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final
>>>>> **************************************************************************
>>>>> This test should generate a message about MPI is either not initialized or
>>>>> has already been finialized.
>>>>> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!!
>>>>> **************************************************************************
>>>>> *** The MPI_Barrier() function was called after MPI_FINALIZE was invoked.
>>>>> *** This is disallowed by the MPI standard.
>>>>> *** Your MPI job will now abort.
>>>>> [burl-ct-x2200-6:4089] Abort after MPI_FINALIZE completed successfully; 
>>>>> not able to guarantee that all other processes were killed!
>>>>> [rolfv@burl-ct-x2200-6 environment]$ echo $status
>>>>> 0
>>>>> [rolfv@burl-ct-x2200-6 environment]$
>>>>> 
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>     
>>>> 
>>>> 
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>   
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to