What specifically do you have in mind ?

After talking with Jeff I withdraw my request to change the approach.  This
is a good approach when one wants to send warnings to some sort of logging
system, in addition to errors.  Sending the data up stream like I suggested
can¹t rely on the error return-code, and as such requires a check on every
return ­ bad idea.

If the call is for a discussion beyond this, this is fine with me, but would
be more useful once a concrete idea on how to implement step 4 is reached.
If people have specific ideas, an early call would be good, otherwise I
would expect that early Jan we would be better prepared to talk about
specifics.

The copy and branch approach is not practical ­ it doubles the maintenance
work, and the point is to leverage on-going work.

Rich


On 12/4/08 5:15 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote:

> The likelihood of a physical meeting about this in the near future is
> unlikely; I think we're all facing travel restrictions and constraints
> with the holidays coming up.
> 
> How about a teleconf to discuss the following about the notifier:
> 
> - what exactly is there today
> - why what is there today is the way it is
> - discuss proposals on different ways to do it
> 
> More specifically, I think we all agree that the idea of an MPI
> application notifying a higher-level entity when it detects errors is
> a good one (e.g., on the host, or in the network, or ...).  I think
> that it is worth discussing in higher bandwidth so that we can avoid
> email hell (I agree with Ralph; this could devolve pretty easily).
> 
> I propose any of the following times to discuss (I'll setup a phone
> bridge):
> 
> - Mon, Dec 8, 2pm, 3pm, or 4pm Eastern
> - Tue, Dec 9, 10am, noon, 1pm, 2pm, 3pm, or 4pm Eastern
> - Wed, Dec 10, any time
> - Thu, Dec 11, 11am, 1pm, 2pm, 3pm, or 4pm Eastern
> - Fri, Dec 12, 9am, 10am, 11am, 2pm, 3pm, or 4pm Eastern
> 
> 
> 
> 
> On Dec 4, 2008, at 3:16 PM, Ralph Castain wrote:
> 
>> > I'm beginning to believe that we need a design meeting specifically
>> > over this question. Too many unknowns exist, with significant
>> > potential problems lurking behind them. Frankly, this issue could
>> > have a major impact on how we operate, performance, and a variety of
>> > other factors going forward - many of which may be difficult to
>> > predict.
>> >
>> > I suspect there may not be "optimal" solutions to many of these
>> > questions, but there certainly will be strong opinions in multiple
>> > directions.
>> >
>> > As part of that discussion, I propose that we consider alternative
>> > methods for meeting the same overall objective - namely, reuse of
>> > the BTL's by another software project. For example, a simple copy-
>> > and-branch is the dominant method today, with patches used by both
>> > parties to cherry-pick the changes they want from the other code
>> > users. Multiple tools have been developed to support this mode of
>> > operation, yet we haven't discussed any of them in this context. The
>> > proposed approach contains a number of impacts that may be avoided
>> > with an alternative approach.
>> >
>> > Without such a meeting, I fear we are going to rapidly dissolve into
>> > email hell again.
>> >
>> > Ralph
>> >
>> >
>> >
>> > On Dec 4, 2008, at 1:07 PM, Eugene Loh wrote:
>> >
>>> >> Richard Graham wrote:
>>>> >>>
>>>> >>> I expect this will involve some sort of well defined interface
>>>> >>> between the btl¹s and orte, and I don¹t know if this will also
>>>> >>> require something like this between the btl¹s and the pml ­ I
>>>> >>> think that interface is rigidly enforced, but am not sure.
>>> >> I'm probably missing the scope of what you're saying here, but it
>>> >> raises another question in my mind.  Is there today a well-defined
>>> >> interface between the BTLs and... anything else?  PML or whatever?
>>> >> Maybe this comes back to a documentation question:  do we (or will
>>> >> we) have anything written down that says what a BTL must do, what
>>> >> it may rely on, etc.?
>>> >> _______________________________________________
>>> >> devel mailing list
>>> >> de...@open-mpi.org
>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> >
>> > _______________________________________________
>> > devel mailing list
>> > de...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> --
> Jeff Squyres
> Cisco Systems
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 

Reply via email to