Hi all Apologies for a 101 level question again, but here it is:
A new BTL layer I am implementing hangs in MPI_Send(). Please keep in mind that at this stage, I am simply desperate to make MPI data move through this fabric in any way possible, so I have thrown all good programming practice out of the window and in the process might have added bugs. The test code basically has a single call to MPI_Send() with 8 bytes of data, the smallest amount the HCA can DMA. I have a very simple mca_btl_component_progress() method that returns 0 if called before mca_btl_endpoint_send() and returns 1 if called after. I use a static variable to keep track whether endpoint_send() has been called. With this, the MPI process hangs with the following stack: (gdb) bt #0 0x00007f7518c60b7d in poll () from /lib64/libc.so.6 #1 0x00007f75183e79f6 in poll_dispatch (base=0x19cf480, tv=0x7f75177efe80) at poll.c:165 #2 0x00007f75183df690 in opal_libevent2022_event_base_loop (base=0x19cf480, flags=1) at event.c:1630 #3 0x00007f75183613d4 in progress_engine (obj=0x19cedd8) at runtime/opal_progress_threads.c:105 #4 0x00007f7518f3ddf5 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f7518c6b1ad in clone () from /lib64/libc.so.6 I am using code from master branch for this work. Obviously I am not doing the progress handling right, and I don't even understand how it should work, as the TCP btl does not even provide a component progress function. Any relevant pointer on how this should be done is highly appreciated. Thanks Durga The surgeon general advises you to eat right, exercise regularly and quit ageing.