The lengths we go to avoid progress :-)
On 11/7/07 10:19 PM, "Richard Graham" <rlgra...@ornl.gov> wrote: > The real problem, as you and others have pointed out is the lack of > predictable time slices for the progress engine to do its work, when relying > on the ULP to make calls into the library... > > Rich > > > On 11/8/07 12:07 AM, "Brian Barrett" <brbar...@open-mpi.org> wrote: > >> As it stands today, the problem is that we can inject things into the >> BTL successfully that are not injected into the NIC (due to software >> flow control). Once a message is injected into the BTL, the PML marks >> completion on the MPI request. If it was a blocking send that got >> marked as complete, but the message isn't injected into the NIC/NIC >> library, and the user doesn't re-enter the MPI library for a >> considerable amount of time, then we have a problem. >> >> Personally, I'd rather just not mark MPI completion until a local >> completion callback from the BTL. But others don't like that idea, so >> we came up with a way for back pressure from the BTL to say "it's not >> on the wire yet". This is more complicated than just not marking MPI >> completion early, but why would we do something that helps real apps >> at the expense of benchmarks? That would just be silly! >> >> Brian >> >> On Nov 7, 2007, at 7:56 PM, Richard Graham wrote: >> >>> Does this mean that we don¹t have a queue to store btl level >>> descriptors that >>> are only partially complete ? Do we do an all or nothing with >>> respect to btl >>> level requests at this stage ? >>> >>> Seems to me like we want to mark things complete at the MPI level >>> ASAP, and >>> that this proposal is not to do that is this correct ? >>> >>> Rich >>> >>> >>> On 11/7/07 11:26 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote: >>> >>>> On Nov 7, 2007, at 9:33 PM, Patrick Geoffray wrote: >>>> >>>>>> Remember that this is all in the context of Galen's proposal for >>>>>> btl_send() to be able to return NOT_ON_WIRE -- meaning that the >>>> send >>>>>> was successful, but it has not yet been sent (e.g., openib BTL >>>>>> buffered it because it ran out of credits). >>>>> >>>>> Sorry if I miss something obvious, but why does the PML has to be >>>>> aware >>>>> of the flow control situation of the BTL ? If the BTL cannot send >>>>> something right away for any reason, it should be the >>>> responsibility >>>>> of >>>>> the BTL to buffer it and to progress on it later. >>>> >>>> >>>> That's currently the way it is. But the BTL currently only has the >>>> option to say two things: >>>> >>>> 1. "ok, done!" -- then the PML will think that the request is >>>> complete >>>> 2. "doh -- error!" -- then the PML thinks that Something Bad >>>> Happened(tm) >>>> >>>> What we really need is for the BTL to have a third option: >>>> >>>> 3. "not done yet!" >>>> >>>> So that the PML knows that the request is not yet done, but will >>>> allow >>>> other things to progress while we're waiting for it to complete. >>>> Without this, the openib BTL currently replies "ok, done!", even when >>>> it has only buffered a message (rather than actually sending it out). >>>> This optimization works great (yeah, I know...) except for apps that >>>> don't dip into the MPI library frequently. :-\ >>>> >>>> -- >>>> Jeff Squyres >>>> Cisco Systems >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel