Hello George and @ll. Sorry for the late answer, but i was doing some trace to see where is set the MPI_ERROR. I took a look to ompi_request_default_wait and try to see what happen with request.
Well, i've noticed that all requests that are not inmediately solved go to ompi_request_wait_completion. But i don't know exactly where the execution jumps when i inject a failure to the receiver of the message. After the failure, the sender does not return from ompi_request_wait_completion to ompi_request_default_wait, and i don't know where to catch when the req->req_status.MPI_ERROR is set. Do you know where jumps the execution? or at least in which error handler? Thanks in advance. Hugo 2011/12/9 George Bosilca <bosi...@eecs.utk.edu> > > On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote: > > Hello George and all. > > I've been adapting some of the code to copy the request, and now i think > that it is working ok. I'm storing the request as you do on the pessimist, > but i'm only logging received messages, as my approach is a pessimist log > based on the receiver. > > I do have a question about how you detect when you have to resend a > message, or at least repost it? > > > The error in the status attached to the request will be set in case of > failure. As the MPI error handler is triggered right before returning above > the MPI layer, at the level where you placed your interception you have all > the freedom you need to handle the faults. > > george. > > > Thanks for the help. > > Hugo > > 2011/11/19 Hugo Daniel Meyer <meyer.h...@gmail.com> > >> >> >> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> >> >>> >>> On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote: >>> >>> >>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> >>> >>>> >>>> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote: >>>> >>>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> >>>> >>>>> >>>>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote: >>>>> >>>>> Hello again. >>>>> >>>>> I was doing some trace into de PML_OB1 files. I start to follow a >>>>> MPI_Ssend() trying to find where a message is stored (in the sender) if it >>>>> is not send until the receiver post the recv, but i didn't find that >>>>> place. >>>>> >>>>> >>>>> Right, you can't find this as the message is not stored on the sender. >>>>> The pointer to the send request is sent encapsulated in the matching >>>>> header, and the receiver will provide it back once the message has been >>>>> matched (this means the data is now ready to flow). >>>>> >>>> >>>> So, what you're saying is that the sender only sends the header, so >>>> when the receiver post the recv will send again the header so the sender >>>> starts with the data sent? am i getting it right? If this is ok, the data >>>> stays in the sender, but where it is stored? >>>> >>>> >>>> If we consider rendez-vous messages the data is remains in the sender >>>> buffer (aka the buffer provided by the upper level to the MPI_Send >>>> function). >>>> >>> >>> Yes, so i will only need to save the headears of the messages (where the >>> status is incomplete), and then maybe just call again the upper level >>> MP_Send. A question here, the headers are not marked as pending (at least i >>> think so), so, my only approach might be to create a list of pending >>> headers and store there the pointer to the send, then try to identify its >>> corresponding upper level MPI_Send and retries it in case of failure, is >>> this a correct approach? >>> >>> >>> Look in the mca/vprotocol/base to see how we deal with the send requests >>> in our message logging protocol. We hijack the send request list, and >>> replace them with our own, allowing us to chain all active requests. This >>> make the tracking of chive requests very simple, and minimize the impact on >>> the overall code. >>> >>> george. >>> >>> >> Ok George. >> I will take a look there and then let you know how it goes. >> >> Thanks. >> >> Hugo >> >>> >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >