What specifically do you have in mind ? After talking with Jeff I withdraw my request to change the approach. This is a good approach when one wants to send warnings to some sort of logging system, in addition to errors. Sending the data up stream like I suggested can¹t rely on the error return-code, and as such requires a check on every return bad idea.
If the call is for a discussion beyond this, this is fine with me, but would be more useful once a concrete idea on how to implement step 4 is reached. If people have specific ideas, an early call would be good, otherwise I would expect that early Jan we would be better prepared to talk about specifics. The copy and branch approach is not practical it doubles the maintenance work, and the point is to leverage on-going work. Rich On 12/4/08 5:15 PM, "Jeff Squyres" <jsquy...@cisco.com> wrote: > The likelihood of a physical meeting about this in the near future is > unlikely; I think we're all facing travel restrictions and constraints > with the holidays coming up. > > How about a teleconf to discuss the following about the notifier: > > - what exactly is there today > - why what is there today is the way it is > - discuss proposals on different ways to do it > > More specifically, I think we all agree that the idea of an MPI > application notifying a higher-level entity when it detects errors is > a good one (e.g., on the host, or in the network, or ...). I think > that it is worth discussing in higher bandwidth so that we can avoid > email hell (I agree with Ralph; this could devolve pretty easily). > > I propose any of the following times to discuss (I'll setup a phone > bridge): > > - Mon, Dec 8, 2pm, 3pm, or 4pm Eastern > - Tue, Dec 9, 10am, noon, 1pm, 2pm, 3pm, or 4pm Eastern > - Wed, Dec 10, any time > - Thu, Dec 11, 11am, 1pm, 2pm, 3pm, or 4pm Eastern > - Fri, Dec 12, 9am, 10am, 11am, 2pm, 3pm, or 4pm Eastern > > > > > On Dec 4, 2008, at 3:16 PM, Ralph Castain wrote: > >> > I'm beginning to believe that we need a design meeting specifically >> > over this question. Too many unknowns exist, with significant >> > potential problems lurking behind them. Frankly, this issue could >> > have a major impact on how we operate, performance, and a variety of >> > other factors going forward - many of which may be difficult to >> > predict. >> > >> > I suspect there may not be "optimal" solutions to many of these >> > questions, but there certainly will be strong opinions in multiple >> > directions. >> > >> > As part of that discussion, I propose that we consider alternative >> > methods for meeting the same overall objective - namely, reuse of >> > the BTL's by another software project. For example, a simple copy- >> > and-branch is the dominant method today, with patches used by both >> > parties to cherry-pick the changes they want from the other code >> > users. Multiple tools have been developed to support this mode of >> > operation, yet we haven't discussed any of them in this context. The >> > proposed approach contains a number of impacts that may be avoided >> > with an alternative approach. >> > >> > Without such a meeting, I fear we are going to rapidly dissolve into >> > email hell again. >> > >> > Ralph >> > >> > >> > >> > On Dec 4, 2008, at 1:07 PM, Eugene Loh wrote: >> > >>> >> Richard Graham wrote: >>>> >>> >>>> >>> I expect this will involve some sort of well defined interface >>>> >>> between the btl¹s and orte, and I don¹t know if this will also >>>> >>> require something like this between the btl¹s and the pml I >>>> >>> think that interface is rigidly enforced, but am not sure. >>> >> I'm probably missing the scope of what you're saying here, but it >>> >> raises another question in my mind. Is there today a well-defined >>> >> interface between the BTLs and... anything else? PML or whatever? >>> >> Maybe this comes back to a documentation question: do we (or will >>> >> we) have anything written down that says what a BTL must do, what >>> >> it may rely on, etc.? >>> >> _______________________________________________ >>> >> devel mailing list >>> >> de...@open-mpi.org >>> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> > >> > _______________________________________________ >> > devel mailing list >> > de...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >