Thanks Tim - good suggestion! Had to modify your proposed code a tad to get
it to compile and work, but it is definitely a cleaner solution.

Ralph


On 3/6/08 1:34 PM, "Tim Mattox" <timat...@open-mpi.org> wrote:

> This still has a race condition... which can be dealt with using
> opal_atomic stuff.
> See below.
> 
> On Thu, Mar 6, 2008 at 2:35 PM,  <r...@osl.iu.edu> wrote:
>> Author: rhc
>>  Date: 2008-03-06 14:35:57 EST (Thu, 06 Mar 2008)
>>  New Revision: 17766
>>  URL: https://svn.open-mpi.org/trac/ompi/changeset/17766
>> 
>>  Log:
>>  Fix a race condition - ensure we don't call terminate in orterun more than
>> once, even if the timeout fires while we are doing so
> [snip]
>>  Modified: trunk/orte/tools/orterun/orterun.c
>>  
>> 
=============================================================================>>
=
>>  --- trunk/orte/tools/orterun/orterun.c  (original)
>>  +++ trunk/orte/tools/orterun/orterun.c  2008-03-06 14:35:57 EST (Thu, 06 Mar
>> 2008)
>>  @@ -112,14 +112,15 @@
>>   static bool want_prefix_by_default = (bool)
>> ORTE_WANT_ORTERUN_PREFIX_BY_DEFAULT;
>>   static opal_event_t *orterun_event, *orteds_exit_event;
>>   static char *ompi_server=NULL;
>>  +static bool terminating=false;
>> 
> [snip]
>>  @@ -644,6 +638,12 @@
>>      orte_proc_t **procs;
>>      orte_vpid_t i;
>> 
>>  +    /* flag that we are here to avoid doing it twice */
>>  +    if (terminating) {
>>  +        return;
>>  +    }
>>  +    terminating = true;
>>  +
> [snip]
> 
> I think this race condition should be dealt with like this:
> 
> #include "opal/sys/atomic.h"
> 
> static opal_atomic_lock_t terminating = OPAL_ATOMIC_UNLOCKED;
> 
> ...
> 
> if (opal_atomic_trylock(&terminating)) { /* returns 1 if already locked */
>     return;
> }
> 


Reply via email to