Re: [OMPI devel] trac ticket 1944 and pending sends

Eugene Loh Tue, 23 Jun 2009 13:22:58 -0400

George Bosilca wrote:

On Jun 23, 2009, at 11:04 , Eugene Loh wrote:
The sm BTL used to have two mechanisms for dealing with congestedFIFOs. One was to grow the FIFOs. Another was to queue pendingsends locally (on the sender's side). I think the grow-FIFOmechanism was typically invoked and the pending-send mechanism usedonly under extreme circumstances (no more memory).
With the sm makeover of 1.3.2, we dropped the ability to growFIFOs. The code added complexity and there seemed to be no need tohave two mechanisms to deal with congested FIFOs. In ticket 1944,however, we see that repeated collectives can produce hangs, andthis seems to be due to the pending-send code not adequately dealingwith congested FIFOs.
Today, when a process tries to write to a remote FIFO and fails, itqueues the write as a pending send. The only condition under whichit retries pending sends is when it gets a fragment back from aremote process.
I think the logic must have been that the FIFO got congested becausewe issued too many sends. Getting a fragment back indicates thatthe remote process has made progress digesting those sends. Inticket 1944, we see that a FIFO can also get congested from too manyreturning fragments. Further, with shared FIFOs, a FIFO couldbecome congested due to the activity of a third-party process.
In sum, getting a fragment back from a remote process is a poorindicator that it's time to retry pending sends.
Maybe the real way to know when to retry pending sends is just tocheck if there's room on the FIFO.
Why this is different than "getting a fragment back"?


I'm not sure I understand your question.

Say we have two processes, A and B. Each one has a receive queue/FIFOthat can be written by its peer. Let's say A sends lots of messages toB. B keeps on returning fragments to A. So, although we're saying thatA sends lots of messages to B, it is A's in-bound queue that fills up.Kind of counterintuitive. Anyhow, B keeps getting more fragments toreturn to A. Since A's queue is full, what this means is that B addsthese fragments to its (B's) own pending-send list.

So, now the question is when B should retry items on its pending-sendlist. Presumably, it should retry when there is room on A'squeue/FIFO. But OMPI (to date) has B retry *only* when B itself gets afragment back. What's the logic? I assume the logic was that A's queuewas filled with fragments that B had sent, so getting a fragment backwould be an indication of A's queue opening up.

Why is this a poor indication? (I'm assuming this is what your questionwas.) Two possible reasons:

1) A's queue might have been filled with fragments that B was returningto A. So, B would get no acknowledgements back from A that progress wasbeing made depleting the queue.

2) (New with OMPI 1.3.2, now that we have shared queues): A's queuemight have been filled with activity from third party processes.

In either case, the only way B now knows whether there is room on A'squeue is... to check the queue if there's room! Nothing is coming backfrom A to indicate that the queue is being drained.

As far as I remember the code, when we get a fragment back we add itback in the LIFO, and therefore it become the next available fragmentfor a send.

Yes, indeed, but I don't understand how this is relevent. The LIFOs(the private free lists where processes maintain unused fragments) don'treally enter this discussion.

So, I'll try modifying MCA_BTL_SM_FIFO_WRITE. It'll start bychecking if there are pending sends. If so, it'll retry them beforeperforming the requested write. This should also help preserveordering a little better. I'm guessing this will not hurt ourmessage latency in any meaningful way, but I'll check this out.
Meanwhile, I wanted to check in with y'all for any guidance youmight have.

Re: [OMPI devel] trac ticket 1944 and pending sends

Reply via email to