Re: [OMPI devel] MPI_Iprobe and mca_btl_sm_component_progress

Terry Dontje Thu, 19 Jun 2008 09:43:28 -0400

George Bosilca wrote:

Terry,
We had a discussion about this few weeks ago. I have a version thatmodify this behavior (SM progress will not return as long as there arepending acks). There was no benefit from doing so (even if one mightthink that less calls to opal_progress might improve the performances).

But my concern is not the raw performance of MPI_Iprobe in this case butmore of an interaction between MPI and an application. The concern isif it takes 2 MPI_Iprobes to get to the real message (instead of one)then could this induce a synchronization delay in an application? Thatis by the application not receiving the "real" message in the firstMPI_Iprobe it may decide to do other work while the other processesare potentially blocked waiting for it to do some communications.

In fact TCP has the potential to exhibit the same behavior. However,TCP after each successful poll it empty the socket, so it might readmore than one message. As we have to empty the temporary buffer, weinterpret most of the messages inside, and this is why TCP exhibit adifferent behavior.

I guess this difference in behavior between the SM BTL and TCP BTL isdisturbing to me. Does just processing one fifo entry per sm_progresscall per connection buying us performance? Would draining the acks bedetrimental to performance? Wouldn't providing the messages at the timethey arrived meet the rule of obviousness to application writers?

I know there is a slippery slope here of saying ok you've read onemessage should read more until there is none on the fifo. I believethat is really debatable and could go either way depending on theapplication. But ack messages are not visible to the users. Which iswhy I was only asking about draining the ack packets.


--td

  george.

On Jun 19, 2008, at 2:16 PM, Terry Dontje wrote:
Galen, George and others that might have SM BTL interest.
In my quest of looking at MPI_Iprobe performance I found what I thinkis an issue. If you have an application that is using the SM BTL anddoes a small message send <=256 followed by an MPI_Iprobe themca_btl_sm_component function that is eventually called as a resultof the opal_progress will receive and ack message from its send andthen return. The net affect is that the real message is after theack message doesn't get read until a second MPI_Iprobe is made.It seems to me that mca_btl_sm_component should read all Ack messagesfrom a particular fifo until it either finds a real send fragment orno more messages on the fifo. Otherwise, we are forcing calls likeMPI_Iprobe to not return messages that are really there. I am notsure by IB but I know that the TCP BTL does not show this issue(which doesn't surprise me since I imagine the BTL is relying on TCPto handle this type of protocol stuff).
Before I go munging with the code I wanted to make sure I am notoverlooking something here. One concern is if I change the code todrain all the ack messages is that going to disrupt performanceelsewhere?
--td
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
[email protected]
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] MPI_Iprobe and mca_btl_sm_component_progress

Reply via email to