Hello Sebastian,
Sounds like you are using the openib btl as a starting point, which is a
good place to start. I am curious if you are indeed using a new
interconnect (new hardware and protocol) or if it is requirements of the
3D-torus network that are not addressed by the openib btl that are
driving the need for a new btl?
-DON
On 07/21/09 11:55, Sebastian Rinke wrote:
Hello,
I am developing a new BTL component (Open MPI v1.3.2) for a new
3D-torus interconnect. During a simple message transfer of 16362 B
between two nodes with MPI_Send(), MPI_Recv() I encounter the following:
The sender:
-----------
1. prepare_src() size: 16304 reserve: 32
-> alloc() size: 16336
-> ompi_convertor_pack(): 16304
2. send()
3. component_progress()
-> send cb ()
-> free()
4. component_progress()
-> recv cb ()
-> prepare_src() size: 58 reserve: 32
-> alloc() size: 90
-> ompi_convertor_pack(): 58
-> free() size: 90 Send is missing !!!
5. NO PROGRESS
The receiver:
-------------
1. component_progress()
-> recv cb ()
-> alloc() size: 32
-> send()
2. component_progress()
-> send cb ()
-> free() size: 32
3. component_progress() for ever !!!
The problem is that after prepare_src() for the 2nd fragment, the
sender calls free() instead of send() in its recv cb. Thus, the 2nd
fragment is not being transmitted.
As a consequence, the receiver waits for the 2nd fragment.
I have found that mca_pml_ob1_recv_frag_callback_ack() is the
corresponding recv cb. Before diving into the ob1 code,
could you tell me under which conditions this cb calls free() instead
of send()
so that I can get an idea of where to look for errors in my BTL
component.
Thank you very much in advance.
Sebastian Rinke
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel