Sorry for the delay. I will try with the MPI_ERRORS_RETURN handler, maybe that is my problem. Thanks a lot for your help.
I'll let you know how it goes. Best regards. Hugo 2011/12/16 George Bosilca <bosi...@eecs.utk.edu> > Setting the error handler to MPI_ERRORS_RETURN is the right solution for > mechanism using the PMPI interface. Hugo is one software layer below the > MPI interface, so the error handler is not affecting his code. However, > once he reacts to an error, he should reset the error (in the status > attached to the request) to MPI_SUCCESS, in order to avoid triggering the > error handler on the way back to the MPI layer. > > george. > > On Dec 16, 2011, at 09:09 , Jeff Squyres wrote: > > > I'm jumping into the middle of this conversation and probably don't have > all the right context, so forgive me if this is a stupid question: did you > set MPI_ERRORS_RETURN on the communicator in question? > > > > > > On Dec 14, 2011, at 10:43 AM, Hugo Daniel Meyer wrote: > > > >> Hello George and @ll. > >> > >> Sorry for the late answer, but i was doing some trace to see where is > set the MPI_ERROR. I took a look to ompi_request_default_wait and try to > see what happen with request. > >> > >> Well, i've noticed that all requests that are not inmediately solved go > to ompi_request_wait_completion. But i don't know exactly where the > execution jumps when i inject a failure to the receiver of the message. > After the failure, the sender does not return from > ompi_request_wait_completion to ompi_request_default_wait, and i don't know > where to catch when the req->req_status.MPI_ERROR is set. Do you know where > jumps the execution? or at least in which error handler? > >> > >> Thanks in advance. > >> > >> Hugo > >> > >> 2011/12/9 George Bosilca <bosi...@eecs.utk.edu> > >> > >> On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote: > >> > >>> Hello George and all. > >>> > >>> I've been adapting some of the code to copy the request, and now i > think that it is working ok. I'm storing the request as you do on the > pessimist, but i'm only logging received messages, as my approach is a > pessimist log based on the receiver. > >>> > >>> I do have a question about how you detect when you have to resend a > message, or at least repost it? > >> > >> The error in the status attached to the request will be set in case of > failure. As the MPI error handler is triggered right before returning above > the MPI layer, at the level where you placed your interception you have all > the freedom you need to handle the faults. > >> > >> george. > >> > >>> > >>> Thanks for the help. > >>> > >>> Hugo > >>> > >>> 2011/11/19 Hugo Daniel Meyer <meyer.h...@gmail.com> > >>> > >>> > >>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> > >>> > >>> On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote: > >>> > >>>> > >>>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> > >>>> > >>>> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote: > >>>> > >>>>> 2011/11/18 George Bosilca <bosi...@eecs.utk.edu> > >>>>> > >>>>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote: > >>>>> > >>>>>> Hello again. > >>>>>> > >>>>>> I was doing some trace into de PML_OB1 files. I start to follow a > MPI_Ssend() trying to find where a message is stored (in the sender) if it > is not send until the receiver post the recv, but i didn't find that place. > >>>>> > >>>>> Right, you can't find this as the message is not stored on the > sender. The pointer to the send request is sent encapsulated in the > matching header, and the receiver will provide it back once the message has > been matched (this means the data is now ready to flow). > >>>>> > >>>>> So, what you're saying is that the sender only sends the header, so > when the receiver post the recv will send again the header so the sender > starts with the data sent? am i getting it right? If this is ok, the data > stays in the sender, but where it is stored? > >>>> > >>>> If we consider rendez-vous messages the data is remains in the sender > buffer (aka the buffer provided by the upper level to the MPI_Send > function). > >>>> > >>>> Yes, so i will only need to save the headears of the messages (where > the status is incomplete), and then maybe just call again the upper level > MP_Send. A question here, the headers are not marked as pending (at least i > think so), so, my only approach might be to create a list of pending > headers and store there the pointer to the send, then try to identify its > corresponding upper level MPI_Send and retries it in case of failure, is > this a correct approach? > >>> > >>> Look in the mca/vprotocol/base to see how we deal with the send > requests in our message logging protocol. We hijack the send request list, > and replace them with our own, allowing us to chain all active requests. > This make the tracking of chive requests very simple, and minimize the > impact on the overall code. > >>> > >>> george. > >>> > >>> > >>> Ok George. > >>> I will take a look there and then let you know how it goes. > >>> > >>> Thanks. > >>> > >>> Hugo > >>> > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >>> > >>> > >>> _______________________________________________ > >>> devel mailing list > >>> de...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > >> > >> _______________________________________________ > >> devel mailing list > >> de...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > > > -- > > Jeff Squyres > > jsquy...@cisco.com > > For corporate legal information go to: > > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > > > _______________________________________________ > > devel mailing list > > de...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >