This still has a race condition... which can be dealt with using
opal_atomic stuff.
See below.

On Thu, Mar 6, 2008 at 2:35 PM,  <r...@osl.iu.edu> wrote:
> Author: rhc
>  Date: 2008-03-06 14:35:57 EST (Thu, 06 Mar 2008)
>  New Revision: 17766
>  URL: https://svn.open-mpi.org/trac/ompi/changeset/17766
>
>  Log:
>  Fix a race condition - ensure we don't call terminate in orterun more than 
> once, even if the timeout fires while we are doing so
[snip]
>  Modified: trunk/orte/tools/orterun/orterun.c
>  
> ==============================================================================
>  --- trunk/orte/tools/orterun/orterun.c  (original)
>  +++ trunk/orte/tools/orterun/orterun.c  2008-03-06 14:35:57 EST (Thu, 06 Mar 
> 2008)
>  @@ -112,14 +112,15 @@
>   static bool want_prefix_by_default = (bool) 
> ORTE_WANT_ORTERUN_PREFIX_BY_DEFAULT;
>   static opal_event_t *orterun_event, *orteds_exit_event;
>   static char *ompi_server=NULL;
>  +static bool terminating=false;
>
[snip]
>  @@ -644,6 +638,12 @@
>      orte_proc_t **procs;
>      orte_vpid_t i;
>
>  +    /* flag that we are here to avoid doing it twice */
>  +    if (terminating) {
>  +        return;
>  +    }
>  +    terminating = true;
>  +
[snip]

I think this race condition should be dealt with like this:

#include "opal/sys/atomic.h"

static opal_atomic_lock_t terminating = OPAL_ATOMIC_UNLOCKED;

...

if (opal_atomic_trylock(&terminating)) { /* returns 1 if already locked */
    return;
}


-- 
Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/
 tmat...@gmail.com || timat...@open-mpi.org
    I'm a bright... http://www.the-brights.net/

Reply via email to