On Jan 11, 2006, at 3:05 AM, Rainer Keller wrote:

Hello dear all,
I had a point on the tbd-list, that I would like to ask here:
 - Shouldn't we have a while-loop condition around every occurence
   of opal_condition_wait (spurious wake-ups)
   As we may do a pthread_cond_wait,
    e.g. in opal_free_list.h and OPAL_FREE_LIST_WAIT ?

I finally got a chance to look at this, and I think for the most part we're ok. There are two that worry me, but I wanted Ralph and Tim to weigh in before I did anything. More info below...

Occurrences:
      ompi/class/ompi_free_list.h

This is ok as is, because the loop protecting against a spurious wakeup is already there. If two threads are waiting, both are woken up, and there's only one request (or somehow, no requests), then they'll try to remove from the list, get NULL, and continue through the bigger while() loop. So that works as expected.

      opal/class/opal_free_list.h

Same reasoning as ompi_free_list.

      ompi/request/req_wait.c          /* Two Occurences: not a
               must, but... */

I believe these are both correct. The first is in a larger do { ...} while loop that will handle the case of a wakeup with no requests ready. The second is in a tight while() loop already, so we're ok there.

      orte/mca/gpr/proxy/gpr_proxy_compound_cmd.c

This one I'd like Ralph to look at, because I"m not sure I understand the logic completely. It looks like this is potentially a problem. Only one thread will be woken up at a time, since the mutex has to be re-acquired. So the question becomes, will anyone give up the mutex with component.compound_cmd_mode left set to true, and I think the answer is yes. This looks like it could be a possible bug if people are using the compound command code when multiple threads are active. Thankfully, I don't think this happens very often.

      orte/mca/iof/base/iof_base_flush.c:108

This looks like it's wrapped in a larger while loop and is safe from any restart wait conditions.

      orte/mca/pls/rsh/pls_rsh_module.c:892

This could be a bit of a problem, but I don't think spurious wake-ups will cause any real problems. The worst case is that possibly we end up trying to concurrently start more processes than we really intended. But Tim might have more insight than I do.


Just my $0.02

Brian

Reply via email to