The return code of your progress function should be related to the activity (send, recv, put, get, etc completion) on your network. The return is not really used right now but may be meaningful in the future.
Your BTL signals progress through two mechanisms: 1) Send completion is indicated by either your btl_send() function returning 1 (this indicates no calls to btl_progress() are needed and that the user buffer is no longer needed), your btl_sendi() function returning OPAL_SUCCESS, or you calling the send fragment's callback function. btl_send() is the minimum function needed but btl_sendi() can provide a faster path to putting a fragment on a network. 2) Receive completion is indicated by calling a callback associated with a fragment's tag. This tag is supplied to btl_send() and btl_sendi() is usually sent with the fragment data (usually inline with the data). A typical progress function polls the network and on finding an incoming fragment, extracts the btl tag and calls the associated calback. It is usually helpful to look at how other btl's work but you can also find quite a bit of information in opal/mca/btl/btl.h. -Nathan On Fri, May 06, 2016 at 12:01:05AM -0400, dpchoudh . wrote: > George > > Thanks for your help. But what should the progress function return, so > that the event is signalled? Right now I am returning a 1 when data has > been transmitted and 0 otherwise, but that does not seem to work. Also, > please keep in mind that the transport I am working on supports unreliable > datagrams only, so there is no ack from the recipient to wait for. > > Thanks again > Durga > The surgeon general advises you to eat right, exercise regularly and quit > ageing. > On Thu, May 5, 2016 at 11:33 PM, George Bosilca <bosi...@icl.utk.edu> > wrote: > > Durga, > TCP doesn't need a specialized progress function because we are tied > directly with libevent. In your case you should provide a BTL progress > function, function that will be called at the end of libevent base loop > regularly. > George. > On Thu, May 5, 2016 at 11:30 PM, dpchoudh . <dpcho...@gmail.com> wrote: > > Hi all > > Apologies for a 101 level question again, but here it is: > > A new BTL layer I am implementing hangs in MPI_Send(). Please keep in > mind that at this stage, I am simply desperate to make MPI data move > through this fabric in any way possible, so I have thrown all good > programming practice out of the window and in the process might have > added bugs. > > The test code basically has a single call to MPI_Send() with 8 bytes > of data, the smallest amount the HCA can DMA. I have a very simple > mca_btl_component_progress() method that returns 0 if called before > mca_btl_endpoint_send() and returns 1 if called after. I use a static > variable to keep track whether endpoint_send() has been called. > > With this, the MPI process hangs with the following stack: > > (gdb) bt > #0 0x00007f7518c60b7d in poll () from /lib64/libc.so.6 > #1 0x00007f75183e79f6 in poll_dispatch (base=0x19cf480, > tv=0x7f75177efe80) at poll.c:165 > #2 0x00007f75183df690 in opal_libevent2022_event_base_loop > (base=0x19cf480, flags=1) at event.c:1630 > #3 0x00007f75183613d4 in progress_engine (obj=0x19cedd8) at > runtime/opal_progress_threads.c:105 > #4 0x00007f7518f3ddf5 in start_thread () from /lib64/libpthread.so.0 > #5 0x00007f7518c6b1ad in clone () from /lib64/libc.so.6 > > I am using code from master branch for this work. > > Obviously I am not doing the progress handling right, and I don't even > understand how it should work, as the TCP btl does not even provide a > component progress function. > > Any relevant pointer on how this should be done is highly appreciated. > > Thanks > Durga > > The surgeon general advises you to eat right, exercise regularly and > quit ageing. > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/05/18919.php > > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/05/18920.php > _______________________________________________ > devel mailing list > de...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel > Link to this post: > http://www.open-mpi.org/community/lists/devel/2016/05/18922.php
pgpItDUCsku5A.pgp
Description: PGP signature