On Thu, 25 Jun 2009, Eugene Loh wrote:

I spoke with Brian and Jeff about this earlier today. Presumably, up through 1.2, mca_btl_component_progress would poll and if it received a message fragment would return. Then, presumably in 1.3.0, behavior was changed to keep polling until the FIFO was empty. Brian said this was based on Terry's desire to keep latency as low as possible in benchmarks. Namely, reaching down into a progress call was a long code path. It would be better to pick up multiple messages, if available on the FIFO, and queue extras up in the unexpected queue. Then, a subsequent call could more efficiently find the anticipated message fragment.

I don't see how the behavior would impact short-message pingpongs (the typical way to measure latency) one way or the other.

I asked Terry, who struggled to remember the issue and pointed me at this thread: http://www.open-mpi.org/community/lists/devel/2008/06/4158.php . But that is related to an issue that's solved if one keeps polling as long as one gets ACKs (but returns as soon as a real message fragment is found).

Can anyone shed some light on the history here? Why keep polling even when a message fragment has been found? The downside of polling too aggressively is that the unexpected queue can grow (without bounds).

Brian's proposal is to set some variable that determines how many message fragments a single mca_btl_sm_component_progress call can drain from the FIFO before returning.

I checked, and 1.3.2 definitely drains all messages until the fifo is empty. If we were to switch to drain until we receive a data message and that fixes Terry's issue, that seems like a rational change and would not require the fix I suggested. My assumption had been that we needed to drain more than one data message per call to component_progress in order to work around Terry's issue. If not, then let's go with the simple fix and only drain one data message per enterance to component_progress (but drain multiple acks if we have a bunch of acks and then a data message in the queue).

Unfortunately I have no more history than what Terry proposed, but it looks like the changes were made around that time.

Brian

Reply via email to