Terry,

We had a discussion about this few weeks ago. I have a version that modify this behavior (SM progress will not return as long as there are pending acks). There was no benefit from doing so (even if one might think that less calls to opal_progress might improve the performances).

In fact TCP has the potential to exhibit the same behavior. However, TCP after each successful poll it empty the socket, so it might read more than one message. As we have to empty the temporary buffer, we interpret most of the messages inside, and this is why TCP exhibit a different behavior.

  george.

On Jun 19, 2008, at 2:16 PM, Terry Dontje wrote:

Galen, George and others that might have SM BTL interest.

In my quest of looking at MPI_Iprobe performance I found what I think is an issue. If you have an application that is using the SM BTL and does a small message send <=256 followed by an MPI_Iprobe the mca_btl_sm_component function that is eventually called as a result of the opal_progress will receive and ack message from its send and then return. The net affect is that the real message is after the ack message doesn't get read until a second MPI_Iprobe is made. It seems to me that mca_btl_sm_component should read all Ack messages from a particular fifo until it either finds a real send fragment or no more messages on the fifo. Otherwise, we are forcing calls like MPI_Iprobe to not return messages that are really there. I am not sure by IB but I know that the TCP BTL does not show this issue (which doesn't surprise me since I imagine the BTL is relying on TCP to handle this type of protocol stuff).

Before I go munging with the code I wanted to make sure I am not overlooking something here. One concern is if I change the code to drain all the ack messages is that going to disrupt performance elsewhere?

--td
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to