On Fri, Jun 10, 2011 at 8:51 AM, Ralph Castain <r...@open-mpi.org> wrote:
>
> On Jun 10, 2011, at 6:38 AM, Josh Hursey wrote:
>
>> Another problem with this patch, that I mentioned to Wesley and George
>> off list, is that it does not handle the case when mpirun/HNP is also
>> hosting processes that might fail. In my testing of the patch it
>> worked fine if mpirun/HNP was -not- hosting any processes, but once it
>> had to host processes then unexpected behavior occurred when a process
>> failed. So for those just listening to this thread, Wesley is working
>> on a revised patch to address this problem that he will post when it
>> is ready.
>
> See my other response to the patch - I think we need to understand why we are 
> storing state in multiple places as it can create unexpected behavior when 
> things are out-of-sync.
>
>
>>
>>
>> As far as the RML issue, doesn't the ORTE state machine branch handle
>> that case? If it does, then let's push the solution to that problem
>> until that branch comes around instead of solving it twice.
>
> No, it doesn't - in fact, it's what breaks the current method. Because we no 
> longer allow event recursion, the RML message never gets out of the app. 
> Hence my question.
>
> I honestly don't think we need to have orte be aware of the distinction 
> between "aborted by cmd" and "aborted by signal" as the only diff is in the 
> error message. There ought to be some other way of resolving this?

MPI_Abort will need to tell ORTE which processes should be 'aborted by
signal' along with the calling process. So there needs to be a
mechanism for that was well. Not sure if I have a good solution to
this in mind just yet.

A thought though, in the state machine version, the process calling
MPI_Abort could post a message to the processing thread and return
from the callback. The callback would have a check at the bottom to
determine if MPI_Abort was triggered within the callback, and just
sleep. The processing thread would progress the RML message and once
finished call exit(). This implies that the application process has a
separate processing thread. But I think we might be able to post the
RML message in the callback, then wait for it to complete outside of
the callback before returning control to the user. :/ Interesting.

-- Josh

>
>
>>
>> -- Josh
>>
>>
>> On Fri, Jun 10, 2011 at 8:22 AM, Ralph Castain <r...@open-mpi.org> wrote:
>>> Something else you might want to address in here: the current code sends an 
>>> RML message from the proc calling abort to its local daemon telling the 
>>> daemon that we are exiting due to the app calling "abort". We needed to do 
>>> this because we wanted to flag the proc termination as one induced by the 
>>> app itself as opposed to something like a segfault or termination by signal.
>>>
>>> However, the problem is that the app may be calling abort from within an 
>>> event handler. Hence, the RML send (which is currently blocking) will never 
>>> complete once we no longer allow event lib recursion (coming soon). If we 
>>> use a non-blocking send, then we can't know for sure that the message has 
>>> been sent before we terminate.
>>>
>>> What we need is a non-messaging way of communicating that this was an 
>>> ordered abort as opposed to a segfault or other failure. Prior to the 
>>> current method, we had the app drop a file that the daemon looked for as an 
>>> "abort  marker", but that was ugly as it sometimes caused us to not 
>>> properly cleanup the session directory tree.
>>>
>>> I'm open to suggestion - perhaps it isn't actually all that critical for us 
>>> to distinguish "aborted by call to abort" from "aborted by signal", and we 
>>> can just have the app commit suicide via self-imposed SIGKILL? It is only 
>>> the message output  to the user at the end of the job that differs - and 
>>> since MPI_Abort already provides a message indicating "we called abort", is 
>>> it really necessary that we have orte aware of that distinction?
>>>
>>>
>>> On Jun 9, 2011, at 6:12 PM, Joshua Hursey wrote:
>>>
>>>>
>>>> On Jun 9, 2011, at 6:47 PM, George Bosilca wrote:
>>>>
>>>>> Well, you're way to trusty. ;)
>>>>
>>>> It's the midwestern boy in me :)
>>>>
>>>>>
>>>>> This only works if all component play the game, and even then there it is 
>>>>> difficult if you want to allow components to deregister themselves in the 
>>>>> middle of the execution. The problem is that a callback will be previous 
>>>>> for some component, and that when you want to remove a callback you have 
>>>>> to inform the "next"  component on the callback chain to change its 
>>>>> previous.
>>>>
>>>> This is a fair point. I think hiding the ordering of callbacks in the 
>>>> errmgr could be dangerous since it takes control from the upper layers, 
>>>> but, conversely, trusting the upper layers to 'do the right thing' with 
>>>> the previous callback is probably too optimistic, esp. for layers that are 
>>>> not designed together.
>>>>
>>>> To that I would suggest that you leave the code as is - registering a 
>>>> callback overwrites the existing callback. That will allow me to replace 
>>>> the default OMPI callback when I am able to in MPI_Init, and, if I need 
>>>> to, swap back in the default version at MPI_Finalize.
>>>>
>>>> Does that sound like a reasonable way forward on this design point?
>>>>
>>>> -- Josh
>>>>
>>>>>
>>>>> george.
>>>>>
>>>>> On Jun 9, 2011, at 13:21 , Josh Hursey wrote:
>>>>>
>>>>>> So the "Resilient ORTE" patch has a registration in ompi_mpi_init.c:
>>>>>> -------------
>>>>>> orte_errmgr.set_fault_callback(&ompi_errhandler_runtime_callback);
>>>>>> -------------
>>>>>>
>>>>>> Which is a callback that just calls abort (which is what we want to do
>>>>>> by default):
>>>>>> -------------
>>>>>> void ompi_errhandler_runtime_callback(orte_process_name_t *proc) {
>>>>>>  ompi_mpi_abort(MPI_COMM_WORLD, 1, false);
>>>>>> }
>>>>>> -------------
>>>>>>
>>>>>> This is what I want to replace. I do -not- want ompi to abort just
>>>>>> because a process failed. So I need a way to replace or remove this
>>>>>> callback, and put in my own callback that 'does the right thing'.
>>>>>>
>>>>>> The current patch allows me to overwrite the callback when I call:
>>>>>> -------------
>>>>>> orte_errmgr.set_fault_callback(&my_callback);
>>>>>> -------------
>>>>>> Which is fine with me.
>>>>>>
>>>>>> At the point I do not want my_callback to be active any more (say in
>>>>>> MPI_Finalize) I would like to replace it with the old callback. To do
>>>>>> so, with the patch's interface, I would have to know what the previous
>>>>>> callback was and do:
>>>>>> -------------
>>>>>> orte_errmgr.set_fault_callback(&ompi_errhandler_runtime_callback);
>>>>>> -------------
>>>>>>
>>>>>> This comes at a slight maintenance burden since now there will be two
>>>>>> places in the code that must explicitly reference
>>>>>> 'ompi_errhandler_runtime_callback' - if it ever changed then both
>>>>>> sites would have to be updated.
>>>>>>
>>>>>>
>>>>>> If you use the 'sigaction-like' interface then upon registration I
>>>>>> would get the previous handler back (which would point to
>>>>>> 'ompi_errhandler_runtime_callback), and I can store it for later:
>>>>>> -------------
>>>>>> orte_errmgr.set_fault_callback(&my_callback, prev_callback);
>>>>>> -------------
>>>>>>
>>>>>> And when it comes time to deregister my callback all I need to do is
>>>>>> replace it with the previous callback - which I have a reference to,
>>>>>> but do not need the explicit name of (passing NULL as the second
>>>>>> argument tells the registration function that I don't care about the
>>>>>> current callback):
>>>>>> -------------
>>>>>> orte_errmgr.set_fault_callback(&prev_callback, NULL);
>>>>>> -------------
>>>>>>
>>>>>>
>>>>>> So the API in the patch is fine, and I can work with it. I just
>>>>>> suggested that it might be slightly better to return the previous
>>>>>> callback (as is done in other standard interfaces - e.g., sigaction)
>>>>>> in case we wanted to do something with it later.
>>>>>>
>>>>>>
>>>>>> What seems to be proposed now is making the errmgr keep a list of all
>>>>>> registered callbacks and call them in some order. This seems odd, and
>>>>>> definitely more complex. Maybe it was just not well explained.
>>>>>>
>>>>>> Maybe that is just the "computer scientist" in me :)
>>>>>>
>>>>>> -- Josh
>>>>>>
>>>>>>
>>>>>> On Thu, Jun 9, 2011 at 1:05 PM, Ralph Castain <r...@open-mpi.org> wrote:
>>>>>>> You mean you want the abort API to point somewhere else, without using 
>>>>>>> a new
>>>>>>> component?
>>>>>>> Perhaps a telecon would help resolve this quicker? I'm available 
>>>>>>> tomorrow or
>>>>>>> anytime next week, if that helps.
>>>>>>>
>>>>>>> On Thu, Jun 9, 2011 at 11:02 AM, Josh Hursey <jjhur...@open-mpi.org> 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> As long as there is the ability to remove and replace a callback I'm
>>>>>>>> fine. I personally think that forcing the errmgr to track ordering of
>>>>>>>> callback registration makes it a more complex solution, but as long as
>>>>>>>> it works.
>>>>>>>>
>>>>>>>> In particular I need to replace the default 'abort' errmgr call in
>>>>>>>> OMPI with something else. If both are called, then this does not help
>>>>>>>> me at all - since the abort behavior will be activated either before
>>>>>>>> or after my callback. So can you explain how I would do that with the
>>>>>>>> current or the proposed interface?
>>>>>>>>
>>>>>>>> -- Josh
>>>>>>>>
>>>>>>>> On Thu, Jun 9, 2011 at 12:54 PM, Ralph Castain <r...@open-mpi.org> 
>>>>>>>> wrote:
>>>>>>>>> I agree - let's not get overly complex unless we can clearly 
>>>>>>>>> articulate
>>>>>>>>> a
>>>>>>>>> requirement to do so.
>>>>>>>>>
>>>>>>>>> On Thu, Jun 9, 2011 at 10:45 AM, George Bosilca <bosi...@eecs.utk.edu>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> This will require exactly opposite registration and de-registration
>>>>>>>>>> order,
>>>>>>>>>> or no de-registration at all (aka no way to unload a component). Or
>>>>>>>>>> some
>>>>>>>>>> even more complex code to deal with internally.
>>>>>>>>>>
>>>>>>>>>> If the error manager handle the callbacks it can use the registration
>>>>>>>>>> ordering (which will be what the the approach can do), and can 
>>>>>>>>>> enforce
>>>>>>>>>> that
>>>>>>>>>> all callbacks will be called. I would rather prefer this approach.
>>>>>>>>>>
>>>>>>>>>> george.
>>>>>>>>>>
>>>>>>>>>> On Jun 9, 2011, at 08:36 , Josh Hursey wrote:
>>>>>>>>>>
>>>>>>>>>>> I would prefer returning the previous callback instead of relying on
>>>>>>>>>>> the errmgr to get the ordering right. Additionally, when I want to
>>>>>>>>>>> unregister (or replace) a call back it is easy to do that with a
>>>>>>>>>>> single interface, than introducing a new one to remove a particular
>>>>>>>>>>> callback.
>>>>>>>>>>> Register:
>>>>>>>>>>> ompi_errmgr.set_fault_callback(my_callback, prev_callback);
>>>>>>>>>>> Deregister:
>>>>>>>>>>> ompi_errmgr.set_fault_callback(prev_callback, old_callback);
>>>>>>>>>>> or to eliminate all callbacks (if you needed that for somme reason):
>>>>>>>>>>> ompi_errmgr.set_fault_callback(NULL, old_callback);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> de...@open-mpi.org
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> de...@open-mpi.org
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Joshua Hursey
>>>>>>>> Postdoctoral Research Associate
>>>>>>>> Oak Ridge National Laboratory
>>>>>>>> http://users.nccs.gov/~jjhursey
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> de...@open-mpi.org
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> de...@open-mpi.org
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Joshua Hursey
>>>>>> Postdoctoral Research Associate
>>>>>> Oak Ridge National Laboratory
>>>>>> http://users.nccs.gov/~jjhursey
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> de...@open-mpi.org
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> de...@open-mpi.org
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> de...@open-mpi.org
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>
>>
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>>
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey

Reply via email to