I am running into a situation that I don’t understand, so thought I would toss
it out and see if someone can give me a hint how to deal with what I am seeing.
I am making a call to MPI_Wait(), which ends up with the following call
sequence:
- ompi_request_default_wait()
- ompi_request_wait_completion() which goes to
while(false == req->req_complete) {
opal_condition_wait(&ompi_request_cond, &ompi_request_lock);
}
The value of ompi_request_cond->c_signaled is 0, so when opal_condition_wait()
is called the code goes to
while (c->c_signaled == 0) {
opal_progress();
OPAL_CR_TEST_CHECKPOINT_READY_STALL();
}
Which spins for ever, since c->c_signaled remains 0 (even though the condition
for which wait is testing has long since been satisfied).
It looks like opal_condition_signal(), opal_condition_broadcast(),
opal_condition_timedwait(), or later on in opal_condition_wait() the value of
c_signaled is changed, but not in the loop the code is stuck in.
Does anyone on the list know how this code is supposed to work, and if so, are
there any hints ?
Looking a bit more it seems like ompi_request_complete() needs to be called.
Can someone explain the assumptions this routine uses ?
if( NULL != request->req_complete_cb ) {
request->req_complete_cb( request );
}
ompi_request_completed++;
request->req_complete = true;
if(with_signal && ompi_request_waiting) {
/* Broadcast the condition, otherwise if there is already a thread
* waiting on another request it can use all signals.
*/
opal_condition_broadcast(& ompi_request_cond);
}
return OMPI_SUCCESS;
What is the significance of ompi_request_completed – is this counter used to
manage something ?
What is ompi_request_cond used for ?
What is ompi_request_waiting used for ?
Thanks,
Rich