Re: [OMPI devel] trac ticket 1944 and pending sends

Brian W. Barrett Wed, 24 Jun 2009 11:10:14 -0400

On Wed, 24 Jun 2009, Eugene Loh wrote:

Brian Barrett wrote:
Or go to what I proposed and USE A LINKED LIST! (as I said before, not anoriginal idea, but one I think has merit) Then you don't have to size thefifo, because there isn't a fifo. Limit the number of send fragments anyone proc can allocate and the only place memory can grow without bound isthe OB1 unexpected list. Then use SEND_COMPLETE instead of SEND_NORMAL inthe collectives without barrier semantics (bcast, reduce, gather, scatter)and you effectively limit how far ahead any one proc can get to somethingthat we can handle, with no performance hit.
I'm still digesting George's mail and trac comments and responses thereto.Meanwhile, a couple of questions here.
First, I think it'd be helpful if you said a few words about how you think alinked list should be used here. I can think of a couple of different ways,and I have questions about each way. Instead of my enumerating these waysand those questions, how about you just be more specific? (We used to growthe FIFOs, so sizing them didn't used to be an issue.)

My thought is to essentially implement a good chunk of the Nemesis designfrom MPICH, so reading that paper might give background on where I'mcoming from. But if it were me....

1) Always limit the number of send fragments that can be allocated tosomething small. This gives us a concrete upper bound on the size of theshared memory region we need to allocate.

2) Rather than a FIFO in which we put offset pointers, which requires alarge amount of memory (p * num_frags), a linked list option offsets thatinto the size of the fragment - it's two fields in there, plus someconstant overhead for the LL structure.

3) On insert, either acquire the lock for the LL and insert at the tail ofthe list or use atomic ops to update the tail of the list (the nemesispaper talks about the atomic sequence). Because there's no FIFO to fillup, there's no deadlock issues.

4) If, on send, you don't have any send fragments available, as they're aconstrainted resource, drain your incoming queue to collect acks - if youdon't get any fragments, return failure to the upper layer and let it tryagain.

5) I can see how #4 might cause issues, as the draining of the queue mightactually result in more send requests. In this case, I'd be tempted tohave two linked lists (they're small, after all), one for incomingfragments and one for acks. This wasn't an option with the fifos, due totheir large size.

Second, I'm curious how elaborate of a fix I should be trying for here. Arewe looking for something to fix the problems at hand, or are we opening thedoor to rearchitecting a big part of the sm BTL?

Well, like Ralph said, I worry about whether we can strap another bandaidon and keep it working. If we can, great. But George's proposal seemslike it undoes all the memory savings work you did, and that worries me.


Brian

Re: [OMPI devel] trac ticket 1944 and pending sends

Reply via email to