On Fri, Jun 10, 2011 at 8:51 AM, Ralph Castain <r...@open-mpi.org> wrote: > > On Jun 10, 2011, at 6:38 AM, Josh Hursey wrote: > >> Another problem with this patch, that I mentioned to Wesley and George >> off list, is that it does not handle the case when mpirun/HNP is also >> hosting processes that might fail. In my testing of the patch it >> worked fine if mpirun/HNP was -not- hosting any processes, but once it >> had to host processes then unexpected behavior occurred when a process >> failed. So for those just listening to this thread, Wesley is working >> on a revised patch to address this problem that he will post when it >> is ready. > > See my other response to the patch - I think we need to understand why we are > storing state in multiple places as it can create unexpected behavior when > things are out-of-sync. > > >> >> >> As far as the RML issue, doesn't the ORTE state machine branch handle >> that case? If it does, then let's push the solution to that problem >> until that branch comes around instead of solving it twice. > > No, it doesn't - in fact, it's what breaks the current method. Because we no > longer allow event recursion, the RML message never gets out of the app. > Hence my question. > > I honestly don't think we need to have orte be aware of the distinction > between "aborted by cmd" and "aborted by signal" as the only diff is in the > error message. There ought to be some other way of resolving this?
MPI_Abort will need to tell ORTE which processes should be 'aborted by signal' along with the calling process. So there needs to be a mechanism for that was well. Not sure if I have a good solution to this in mind just yet. A thought though, in the state machine version, the process calling MPI_Abort could post a message to the processing thread and return from the callback. The callback would have a check at the bottom to determine if MPI_Abort was triggered within the callback, and just sleep. The processing thread would progress the RML message and once finished call exit(). This implies that the application process has a separate processing thread. But I think we might be able to post the RML message in the callback, then wait for it to complete outside of the callback before returning control to the user. :/ Interesting. -- Josh > > >> >> -- Josh >> >> >> On Fri, Jun 10, 2011 at 8:22 AM, Ralph Castain <r...@open-mpi.org> wrote: >>> Something else you might want to address in here: the current code sends an >>> RML message from the proc calling abort to its local daemon telling the >>> daemon that we are exiting due to the app calling "abort". We needed to do >>> this because we wanted to flag the proc termination as one induced by the >>> app itself as opposed to something like a segfault or termination by signal. >>> >>> However, the problem is that the app may be calling abort from within an >>> event handler. Hence, the RML send (which is currently blocking) will never >>> complete once we no longer allow event lib recursion (coming soon). If we >>> use a non-blocking send, then we can't know for sure that the message has >>> been sent before we terminate. >>> >>> What we need is a non-messaging way of communicating that this was an >>> ordered abort as opposed to a segfault or other failure. Prior to the >>> current method, we had the app drop a file that the daemon looked for as an >>> "abort marker", but that was ugly as it sometimes caused us to not >>> properly cleanup the session directory tree. >>> >>> I'm open to suggestion - perhaps it isn't actually all that critical for us >>> to distinguish "aborted by call to abort" from "aborted by signal", and we >>> can just have the app commit suicide via self-imposed SIGKILL? It is only >>> the message output to the user at the end of the job that differs - and >>> since MPI_Abort already provides a message indicating "we called abort", is >>> it really necessary that we have orte aware of that distinction? >>> >>> >>> On Jun 9, 2011, at 6:12 PM, Joshua Hursey wrote: >>> >>>> >>>> On Jun 9, 2011, at 6:47 PM, George Bosilca wrote: >>>> >>>>> Well, you're way to trusty. ;) >>>> >>>> It's the midwestern boy in me :) >>>> >>>>> >>>>> This only works if all component play the game, and even then there it is >>>>> difficult if you want to allow components to deregister themselves in the >>>>> middle of the execution. The problem is that a callback will be previous >>>>> for some component, and that when you want to remove a callback you have >>>>> to inform the "next" component on the callback chain to change its >>>>> previous. >>>> >>>> This is a fair point. I think hiding the ordering of callbacks in the >>>> errmgr could be dangerous since it takes control from the upper layers, >>>> but, conversely, trusting the upper layers to 'do the right thing' with >>>> the previous callback is probably too optimistic, esp. for layers that are >>>> not designed together. >>>> >>>> To that I would suggest that you leave the code as is - registering a >>>> callback overwrites the existing callback. That will allow me to replace >>>> the default OMPI callback when I am able to in MPI_Init, and, if I need >>>> to, swap back in the default version at MPI_Finalize. >>>> >>>> Does that sound like a reasonable way forward on this design point? >>>> >>>> -- Josh >>>> >>>>> >>>>> george. >>>>> >>>>> On Jun 9, 2011, at 13:21 , Josh Hursey wrote: >>>>> >>>>>> So the "Resilient ORTE" patch has a registration in ompi_mpi_init.c: >>>>>> ------------- >>>>>> orte_errmgr.set_fault_callback(&ompi_errhandler_runtime_callback); >>>>>> ------------- >>>>>> >>>>>> Which is a callback that just calls abort (which is what we want to do >>>>>> by default): >>>>>> ------------- >>>>>> void ompi_errhandler_runtime_callback(orte_process_name_t *proc) { >>>>>> ompi_mpi_abort(MPI_COMM_WORLD, 1, false); >>>>>> } >>>>>> ------------- >>>>>> >>>>>> This is what I want to replace. I do -not- want ompi to abort just >>>>>> because a process failed. So I need a way to replace or remove this >>>>>> callback, and put in my own callback that 'does the right thing'. >>>>>> >>>>>> The current patch allows me to overwrite the callback when I call: >>>>>> ------------- >>>>>> orte_errmgr.set_fault_callback(&my_callback); >>>>>> ------------- >>>>>> Which is fine with me. >>>>>> >>>>>> At the point I do not want my_callback to be active any more (say in >>>>>> MPI_Finalize) I would like to replace it with the old callback. To do >>>>>> so, with the patch's interface, I would have to know what the previous >>>>>> callback was and do: >>>>>> ------------- >>>>>> orte_errmgr.set_fault_callback(&ompi_errhandler_runtime_callback); >>>>>> ------------- >>>>>> >>>>>> This comes at a slight maintenance burden since now there will be two >>>>>> places in the code that must explicitly reference >>>>>> 'ompi_errhandler_runtime_callback' - if it ever changed then both >>>>>> sites would have to be updated. >>>>>> >>>>>> >>>>>> If you use the 'sigaction-like' interface then upon registration I >>>>>> would get the previous handler back (which would point to >>>>>> 'ompi_errhandler_runtime_callback), and I can store it for later: >>>>>> ------------- >>>>>> orte_errmgr.set_fault_callback(&my_callback, prev_callback); >>>>>> ------------- >>>>>> >>>>>> And when it comes time to deregister my callback all I need to do is >>>>>> replace it with the previous callback - which I have a reference to, >>>>>> but do not need the explicit name of (passing NULL as the second >>>>>> argument tells the registration function that I don't care about the >>>>>> current callback): >>>>>> ------------- >>>>>> orte_errmgr.set_fault_callback(&prev_callback, NULL); >>>>>> ------------- >>>>>> >>>>>> >>>>>> So the API in the patch is fine, and I can work with it. I just >>>>>> suggested that it might be slightly better to return the previous >>>>>> callback (as is done in other standard interfaces - e.g., sigaction) >>>>>> in case we wanted to do something with it later. >>>>>> >>>>>> >>>>>> What seems to be proposed now is making the errmgr keep a list of all >>>>>> registered callbacks and call them in some order. This seems odd, and >>>>>> definitely more complex. Maybe it was just not well explained. >>>>>> >>>>>> Maybe that is just the "computer scientist" in me :) >>>>>> >>>>>> -- Josh >>>>>> >>>>>> >>>>>> On Thu, Jun 9, 2011 at 1:05 PM, Ralph Castain <r...@open-mpi.org> wrote: >>>>>>> You mean you want the abort API to point somewhere else, without using >>>>>>> a new >>>>>>> component? >>>>>>> Perhaps a telecon would help resolve this quicker? I'm available >>>>>>> tomorrow or >>>>>>> anytime next week, if that helps. >>>>>>> >>>>>>> On Thu, Jun 9, 2011 at 11:02 AM, Josh Hursey <jjhur...@open-mpi.org> >>>>>>> wrote: >>>>>>>> >>>>>>>> As long as there is the ability to remove and replace a callback I'm >>>>>>>> fine. I personally think that forcing the errmgr to track ordering of >>>>>>>> callback registration makes it a more complex solution, but as long as >>>>>>>> it works. >>>>>>>> >>>>>>>> In particular I need to replace the default 'abort' errmgr call in >>>>>>>> OMPI with something else. If both are called, then this does not help >>>>>>>> me at all - since the abort behavior will be activated either before >>>>>>>> or after my callback. So can you explain how I would do that with the >>>>>>>> current or the proposed interface? >>>>>>>> >>>>>>>> -- Josh >>>>>>>> >>>>>>>> On Thu, Jun 9, 2011 at 12:54 PM, Ralph Castain <r...@open-mpi.org> >>>>>>>> wrote: >>>>>>>>> I agree - let's not get overly complex unless we can clearly >>>>>>>>> articulate >>>>>>>>> a >>>>>>>>> requirement to do so. >>>>>>>>> >>>>>>>>> On Thu, Jun 9, 2011 at 10:45 AM, George Bosilca <bosi...@eecs.utk.edu> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> This will require exactly opposite registration and de-registration >>>>>>>>>> order, >>>>>>>>>> or no de-registration at all (aka no way to unload a component). Or >>>>>>>>>> some >>>>>>>>>> even more complex code to deal with internally. >>>>>>>>>> >>>>>>>>>> If the error manager handle the callbacks it can use the registration >>>>>>>>>> ordering (which will be what the the approach can do), and can >>>>>>>>>> enforce >>>>>>>>>> that >>>>>>>>>> all callbacks will be called. I would rather prefer this approach. >>>>>>>>>> >>>>>>>>>> george. >>>>>>>>>> >>>>>>>>>> On Jun 9, 2011, at 08:36 , Josh Hursey wrote: >>>>>>>>>> >>>>>>>>>>> I would prefer returning the previous callback instead of relying on >>>>>>>>>>> the errmgr to get the ordering right. Additionally, when I want to >>>>>>>>>>> unregister (or replace) a call back it is easy to do that with a >>>>>>>>>>> single interface, than introducing a new one to remove a particular >>>>>>>>>>> callback. >>>>>>>>>>> Register: >>>>>>>>>>> ompi_errmgr.set_fault_callback(my_callback, prev_callback); >>>>>>>>>>> Deregister: >>>>>>>>>>> ompi_errmgr.set_fault_callback(prev_callback, old_callback); >>>>>>>>>>> or to eliminate all callbacks (if you needed that for somme reason): >>>>>>>>>>> ompi_errmgr.set_fault_callback(NULL, old_callback); >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> devel mailing list >>>>>>>>>> de...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> devel mailing list >>>>>>>>> de...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Joshua Hursey >>>>>>>> Postdoctoral Research Associate >>>>>>>> Oak Ridge National Laboratory >>>>>>>> http://users.nccs.gov/~jjhursey >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> devel mailing list >>>>>>>> de...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> devel mailing list >>>>>>> de...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Joshua Hursey >>>>>> Postdoctoral Research Associate >>>>>> Oak Ridge National Laboratory >>>>>> http://users.nccs.gov/~jjhursey >>>>>> >>>>>> _______________________________________________ >>>>>> devel mailing list >>>>>> de...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>>> >>>>> _______________________________________________ >>>>> devel mailing list >>>>> de...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>>>> >>>> >>>> >>>> _______________________________________________ >>>> devel mailing list >>>> de...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> >> >> >> >> -- >> Joshua Hursey >> Postdoctoral Research Associate >> Oak Ridge National Laboratory >> http://users.nccs.gov/~jjhursey >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- Joshua Hursey Postdoctoral Research Associate Oak Ridge National Laboratory http://users.nccs.gov/~jjhursey