Josh, You mentioned some MCA parameters that you would include in the email, but I don't see those parameters anywhere. Could you please put those in here to make testing easier for people.
Wesley On Wed, Mar 10, 2010 at 1:26 PM, Josh Hursey <jjhur...@open-mpi.org> wrote: > Yesterday evening George, Thomas and I discussed some of their concerns > about this RFC at the MPI Forum meeting. After the discussion, we seemed to > be in agreement that the RecoS framework is a good idea and the concepts and > fixes in this RFC should move forward with a couple of notes: > > - They wanted to test the branch a bit more over the next couple of days. > Some MCA parameters that you will need are at the bottom of this message. > > - Reiterate that this RFC only addresses ORTE stability, not OMPI > stability. The OMPI stability extension is a second step for the line of > work, and should/will fit in nicely with the RecoS framework being proposed > in this RFC. The OMPI layer stability will require a significant amount of > work, but the RecoS framework will provide the ORTE layer stability that is > required as a foundation for OMPI layer stability in the future. > > - The purpose of the ErrMgr becomes slightly unclear with the addition of > the RecoS framework, since both are focused on responding to faults in the > system (and RecoS, when enabled, overrides most/all of the ErrMgr > functionality). Should the RecoS framework be merged with the ErrMgr > framework to create a new ErrMgr interface? > > We are typing to decide if we should merge these frameworks, but at this > point we are interested in hearing how other developers feel about merging > the ErrMgr and RecoS frameworks, which would change the ErrMgr API. Are > there any developers out there that are developing ErrMgr components, or are > using any particular features of the existing ErrMgr framework that they > would like to see preserved in the next revision. By default, the existing > default abort behavior of the ErrMgr framework will be preserved, so the > user will have to 'opt-in' to any fault recovery capabilities. > > So we are continuing the discussion a bit more off-list, and will return to > the list with an updated RFC (and possibly a new branch) soon (hopefully end > of the week/early next week). I would like to briefly discuss this RFC at > the Open MPI teleconf next Tuesday. > > -- Josh > > On Feb 26, 2010, at 8:06 AM, Josh Hursey wrote: > > > Sounds good to me. > > > > For those casually following this RFC let me summarize its current state. > > > > Josh and George (and anyone else that wishes to participate attending the > forum) will meet sometime at the next MPI Forum meeting (March 8-10). I will > post any relevant notes from this meeting back to the list afterwards. So > the RFC is on hold pending the outcome of that meeting. For those developers > interested in this RFC that will not be able to attend, feel free to > continue using this thread for discussion. > > > > Thanks, > > Josh > > > > On Feb 26, 2010, at 6:09 AM, George Bosilca wrote: > > > >> > >> On Feb 26, 2010, at 01:50 , Josh Hursey wrote: > >> > >>> Any of those options are fine with me. I was thinking that if you > wanted to talk sooner, we might be able to help explain our intentions with > this framework a bit better. I figure that the framework interface will > change a bit as we all advance and incorporate our various techniques into > it. I think that the current interface is a good first step, but there are > certainly many more steps to come. > >>> > >>> I am fine delaying this code a bit, just not too long. Meeting at the > forum for a while might be a good option (we could probably even arrange to > call in others if you wanted). > >> > >> Sounds good, let do this. > >> > >> Thanks, > >> george. > >> > >>> > >>> Cheers, > >>> Josh > >>> > >>> On Feb 25, 2010, at 6:45 PM, Ralph Castain wrote: > >>> > >>>> If Josh is going to be at the forum, perhaps you folks could chat > there? Might as well take advantage of being colocated, if possible. > >>>> > >>>> Otherwise, I'm available pretty much any time. I can't contribute much > about the MPI recovery issues, but can contribute to the RTE issues if that > helps. > >>>> > >>>> > >>>> On Thu, Feb 25, 2010 at 7:39 PM, George Bosilca <bosi...@eecs.utk.edu> > wrote: > >>>> Josh, > >>>> > >>>> Next week is a little bit too early as will need some time to figure > out how to integrate with this new framework, and at what extent our code > and requirements fit into. Then the week after is the MPI Forum. How about > on Thursday 11 March? > >>>> > >>>> Thanks, > >>>> george. > >>>> > >>>> On Feb 25, 2010, at 12:46 , Josh Hursey wrote: > >>>> > >>>>> Per my previous suggestion, would it be useful to chat on the phone > early next week about our various strategies? > >>>> > >>>> > >>>> _______________________________________________ > >>>> devel mailing list > >>>> de...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>>> > >>>> _______________________________________________ > >>>> devel mailing list > >>>> de...@open-mpi.org > >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> > >>> > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >