I am running into a situation that I don’t understand, so thought I would toss 
it out and see if someone can give me a hint how to deal with what I am seeing. 
 I am making a call to MPI_Wait(), which ends up with the following call 
sequence:
  - ompi_request_default_wait()
  - ompi_request_wait_completion()  which goes to
          while(false == req->req_complete) {
            opal_condition_wait(&ompi_request_cond, &ompi_request_lock);
        }
 The value of ompi_request_cond->c_signaled is 0, so when opal_condition_wait() 
is called the code goes to
        while (c->c_signaled == 0) {
            opal_progress();
            OPAL_CR_TEST_CHECKPOINT_READY_STALL();
        }
Which spins for ever, since c->c_signaled  remains 0 (even though the condition 
for which wait is testing has long since been satisfied).

It looks like opal_condition_signal(), opal_condition_broadcast(), 
opal_condition_timedwait(), or later on in opal_condition_wait() the value of 
c_signaled  is changed, but not in the loop the code is stuck in.

Does anyone on the list know how this code is supposed to work, and if so, are 
there any hints ?

Looking a bit more it seems like ompi_request_complete() needs to be called.  
Can someone explain the assumptions this routine uses ?

    if( NULL != request->req_complete_cb ) {
        request->req_complete_cb( request );
    }
   ompi_request_completed++;
    request->req_complete = true;
    if(with_signal && ompi_request_waiting) {
        /* Broadcast the condition, otherwise if there is already a thread
         * waiting on another request it can use all signals.
         */
        opal_condition_broadcast(& ompi_request_cond);
    }
    return OMPI_SUCCESS;

What is the significance of  ompi_request_completed – is this counter used to 
manage something ?
What is ompi_request_cond used for ?
What is ompi_request_waiting used for ?

Thanks,
Rich

Reply via email to