Re: [OMPI devel] collective problems

Richard Graham Wed, 7 Nov 2007 23:20:02 -0500

The real problem, as you and others have pointed out is the lack of
predictable time slices for the progress engine to do its work, when relying
on the ULP to make calls into the library...


Rich


On 11/8/07 12:07 AM, "Brian Barrett" <brbar...@open-mpi.org> wrote:

> As it stands today, the problem is that we can inject things into the
> BTL successfully that are not injected into the NIC (due to software
> flow control).  Once a message is injected into the BTL, the PML marks
> completion on the MPI request.  If it was a blocking send that got
> marked as complete, but the message isn't injected into the NIC/NIC
> library, and the user doesn't re-enter the MPI library for a
> considerable amount of time, then we have a problem.
> 
> Personally, I'd rather just not mark MPI completion until a local
> completion callback from the BTL.  But others don't like that idea, so
> we came up with a way for back pressure from the BTL to say "it's not
> on the wire yet".  This is more complicated than just not marking MPI
> completion early, but why would we do something that helps real apps
> at the expense of benchmarks?  That would just be silly!
> 
> Brian
> 
> On Nov 7, 2007, at 7:56 PM, Richard Graham wrote:
> 
>> Does this mean that we don¹t have a queue to store btl level
>> descriptors that
>>  are only partially complete ?  Do we do an all or nothing with
>> respect to btl
>>  level requests at this stage ?
>> 
>> Seems to me like we want to mark things complete at the MPI level
>> ASAP, and
>>  that this proposal is not to do that  is this correct ?
>> 
>> Rich
>> 
>> 
>> On 11/7/07 11:26 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:
>> 
>>> On Nov 7, 2007, at 9:33 PM, Patrick Geoffray wrote:
>>> 
>>>>> Remember that this is all in the context of Galen's proposal for
>>>>> btl_send() to be able to return NOT_ON_WIRE -- meaning that the
>>> send
>>>>> was successful, but it has not yet been sent (e.g., openib BTL
>>>>> buffered it because it ran out of credits).
>>>> 
>>>> Sorry if I miss something obvious, but why does the PML has to be
>>>> aware
>>>> of the flow control situation of the BTL ? If the BTL cannot send
>>>> something right away for any reason, it should be the
>>> responsibility
>>>> of
>>>> the BTL to buffer it and to progress on it later.
>>> 
>>> 
>>> That's currently the way it is.  But the BTL currently only has the
>>> option to say two things:
>>> 
>>> 1. "ok, done!" -- then the PML will think that the request is
>>> complete
>>> 2. "doh -- error!" -- then the PML thinks that Something Bad
>>> Happened(tm)
>>> 
>>> What we really need is for the BTL to have a third option:
>>> 
>>> 3. "not done yet!"
>>> 
>>> So that the PML knows that the request is not yet done, but will
>>> allow
>>> other things to progress while we're waiting for it to complete.
>>> Without this, the openib BTL currently replies "ok, done!", even when
>>> it has only buffered a message (rather than actually sending it out).
>>> This optimization works great (yeah, I know...) except for apps that
>>> don't dip into the MPI library frequently.  :-\
>>> 
>>> --
>>> Jeff Squyres
>>> Cisco Systems
>>> 
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>> 
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] collective problems

Reply via email to