Thanks Tim - good suggestion! Had to modify your proposed code a tad to get it to compile and work, but it is definitely a cleaner solution.
Ralph On 3/6/08 1:34 PM, "Tim Mattox" <timat...@open-mpi.org> wrote: > This still has a race condition... which can be dealt with using > opal_atomic stuff. > See below. > > On Thu, Mar 6, 2008 at 2:35 PM, <r...@osl.iu.edu> wrote: >> Author: rhc >> Date: 2008-03-06 14:35:57 EST (Thu, 06 Mar 2008) >> New Revision: 17766 >> URL: https://svn.open-mpi.org/trac/ompi/changeset/17766 >> >> Log: >> Fix a race condition - ensure we don't call terminate in orterun more than >> once, even if the timeout fires while we are doing so > [snip] >> Modified: trunk/orte/tools/orterun/orterun.c >> >> =============================================================================>> = >> --- trunk/orte/tools/orterun/orterun.c (original) >> +++ trunk/orte/tools/orterun/orterun.c 2008-03-06 14:35:57 EST (Thu, 06 Mar >> 2008) >> @@ -112,14 +112,15 @@ >> static bool want_prefix_by_default = (bool) >> ORTE_WANT_ORTERUN_PREFIX_BY_DEFAULT; >> static opal_event_t *orterun_event, *orteds_exit_event; >> static char *ompi_server=NULL; >> +static bool terminating=false; >> > [snip] >> @@ -644,6 +638,12 @@ >> orte_proc_t **procs; >> orte_vpid_t i; >> >> + /* flag that we are here to avoid doing it twice */ >> + if (terminating) { >> + return; >> + } >> + terminating = true; >> + > [snip] > > I think this race condition should be dealt with like this: > > #include "opal/sys/atomic.h" > > static opal_atomic_lock_t terminating = OPAL_ATOMIC_UNLOCKED; > > ... > > if (opal_atomic_trylock(&terminating)) { /* returns 1 if already locked */ > return; > } >