I didn't runt that specific test, but I did run a test that calls MPI_Abort. I found a bug this morning, though (reported by Sam) that was causing the state of remote procs to be incorrectly reported.
Try with r23048 or higher. On Apr 27, 2010, at 9:15 AM, Rolf vandeVaart wrote: > Ralph, did you get a chance to run the ibm/final test to see if these changes > fixed the problem? I just rebuilt the trunk and tried it and I still get an > exit status of 0 back. I will run it again to make sure I have not made a > mistake. > > Rolf > > On 04/26/10 23:43, Ralph Castain wrote: >> >> Okay, this should finally be fixed. See the commit message for r23045 for an >> explanation. >> >> It really wasn't anything in the cited changeset that caused the problem. >> The root cause is that $#@$ abort file we dropped in the session dir to >> indicate you called MPI_Abort vs trying to thoroughly cleanup. Been biting >> us in the butt for years - finally removed it. >> >> >> On Apr 26, 2010, at 12:58 PM, Rolf vandeVaart wrote: >> >>> The ibm/final test does not call MPI_Abort directly. It is calling >>> MPI_Barrier after MPI_Finalize is called, which is a no-no. This is >>> detected and eventually the library calls ompi_mpi_abort(). This is very >>> similar to MPI_Abort() which ultimately calls ompi_mpi_abort as well. So, >>> I guess I am saying for all intents and purposes, it calls MPI_Abort. >>> >>> Rolf >>> >>> On 04/26/10 14:41, Ralph Castain wrote: >>>> >>>> I'll try to keep it in mind as I continue the errmgr work. I gather these >>>> tests all call MPI_Abort? >>>> >>>> >>>> On Apr 26, 2010, at 12:31 PM, Rolf vandeVaart wrote: >>>> >>>> >>>>> With our MTT testing we have noticed a problem that has cropped up in the >>>>> trunk. There are some tests that are supposed to return a non-zero >>>>> status because they are getting errors, but are instead returning 0. >>>>> This problem does not exist in r23022 but does exist in r23023. >>>>> >>>>> One can use the ibm/final test to reproduce the problem. An example of a >>>>> passing case followed by a failing case is shown below. >>>>> >>>>> Ralph, you want me to open a ticket on this? Or do you just want to take >>>>> a look. I am asking you since you did the r23023 commit. >>>>> >>>>> Rolf >>>>> >>>>> >>>>> TRUNK VERSION r23022: >>>>> [rolfv@burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final >>>>> ************************************************************************** >>>>> This test should generate a message about MPI is either not initialized or >>>>> has already been finialized. >>>>> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!! >>>>> ************************************************************************** >>>>> *** The MPI_Barrier() function was called after MPI_FINALIZE was invoked. >>>>> *** This is disallowed by the MPI standard. >>>>> *** Your MPI job will now abort. >>>>> [burl-ct-x2200-6:6072] Abort after MPI_FINALIZE completed successfully; >>>>> not able to guarantee that all other processes were killed! >>>>> -------------------------------------------------------------------------- >>>>> mpirun noticed that the job aborted, but has no info as to the process >>>>> that caused that situation. >>>>> -------------------------------------------------------------------------- >>>>> [rolfv@burl-ct-x2200-6 environment]$ echo $status >>>>> 1 >>>>> [rolfv@burl-ct-x2200-6 environment]$ >>>>> >>>>> >>>>> TRUNK VERSION r23023: >>>>> [rolfv@burl-ct-x2200-6 environment]$ mpirun -np 1 -mca btl sm,self final >>>>> ************************************************************************** >>>>> This test should generate a message about MPI is either not initialized or >>>>> has already been finialized. >>>>> ERRORS ARE EXPECTED AND NORMAL IN THIS PROGRAM!! >>>>> ************************************************************************** >>>>> *** The MPI_Barrier() function was called after MPI_FINALIZE was invoked. >>>>> *** This is disallowed by the MPI standard. >>>>> *** Your MPI job will now abort. >>>>> [burl-ct-x2200-6:4089] Abort after MPI_FINALIZE completed successfully; >>>>> not able to guarantee that all other processes were killed! >>>>> [rolfv@burl-ct-x2200-6 environment]$ echo $status >>>>> 0 >>>>> [rolfv@burl-ct-x2200-6 environment]$ >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel