On Thu, 25 Jun 2009, Eugene Loh wrote:
I spoke with Brian and Jeff about this earlier today. Presumably, up through
1.2, mca_btl_component_progress would poll and if it received a message
fragment would return. Then, presumably in 1.3.0, behavior was changed to
keep polling until the FIFO was empty. Brian said this was based on Terry's
desire to keep latency as low as possible in benchmarks. Namely, reaching
down into a progress call was a long code path. It would be better to pick
up multiple messages, if available on the FIFO, and queue extras up in the
unexpected queue. Then, a subsequent call could more efficiently find the
anticipated message fragment.
I don't see how the behavior would impact short-message pingpongs (the
typical way to measure latency) one way or the other.
I asked Terry, who struggled to remember the issue and pointed me at this
thread: http://www.open-mpi.org/community/lists/devel/2008/06/4158.php .
But that is related to an issue that's solved if one keeps polling as long as
one gets ACKs (but returns as soon as a real message fragment is found).
Can anyone shed some light on the history here? Why keep polling even when a
message fragment has been found? The downside of polling too aggressively is
that the unexpected queue can grow (without bounds).
Brian's proposal is to set some variable that determines how many message
fragments a single mca_btl_sm_component_progress call can drain from the FIFO
before returning.
I checked, and 1.3.2 definitely drains all messages until the fifo is
empty. If we were to switch to drain until we receive a data message and
that fixes Terry's issue, that seems like a rational change and would not
require the fix I suggested. My assumption had been that we needed to
drain more than one data message per call to component_progress in order
to work around Terry's issue. If not, then let's go with the simple fix
and only drain one data message per enterance to component_progress (but
drain multiple acks if we have a bunch of acks and then a data message in
the queue).
Unfortunately I have no more history than what Terry proposed, but it
looks like the changes were made around that time.
Brian