Terry,
We had a discussion about this few weeks ago. I have a version that
modify this behavior (SM progress will not return as long as there are
pending acks). There was no benefit from doing so (even if one might
think that less calls to opal_progress might improve the performances).
In fact TCP has the potential to exhibit the same behavior. However,
TCP after each successful poll it empty the socket, so it might read
more than one message. As we have to empty the temporary buffer, we
interpret most of the messages inside, and this is why TCP exhibit a
different behavior.
george.
On Jun 19, 2008, at 2:16 PM, Terry Dontje wrote:
Galen, George and others that might have SM BTL interest.
In my quest of looking at MPI_Iprobe performance I found what I
think is an issue. If you have an application that is using the SM
BTL and does a small message send <=256 followed by an MPI_Iprobe
the mca_btl_sm_component function that is eventually called as a
result of the opal_progress will receive and ack message from its
send and then return. The net affect is that the real message is
after the ack message doesn't get read until a second MPI_Iprobe is
made.
It seems to me that mca_btl_sm_component should read all Ack
messages from a particular fifo until it either finds a real send
fragment or no more messages on the fifo. Otherwise, we are forcing
calls like MPI_Iprobe to not return messages that are really there.
I am not sure by IB but I know that the TCP BTL does not show this
issue (which doesn't surprise me since I imagine the BTL is relying
on TCP to handle this type of protocol stuff).
Before I go munging with the code I wanted to make sure I am not
overlooking something here. One concern is if I change the code to
drain all the ack messages is that going to disrupt performance
elsewhere?
--td
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel