the add_procs subroutine of the btl should be called.

/* i added a printf in mca_btl_tcp_add_procs and it *is* invoked */

can you try again with --mca pml ob1 --mca pml_base_verbose 100 ?

maybe the add_procs subroutine is not invoked because openmpi uses cm instead of ob1


Cheers,


Gilles


On 4/28/2016 3:07 PM, dpchoudh . wrote:
Hello all

I am struggling with this issue for last few days and thought it would be prudent to ask for help from people who have way more experience than I do.

There are two questions, interrelated in my mind, but may not be so in reality. Question 2 is the issue I am struggling with, and question 1 sort of leads to it.

1. I see that both in openib and tcp BTL (the two kind of hardware I have access to) a modex send happens, but a matching modex receive never happens. Is it because of some kind of optimization? (In my case, both IP NICs are in the same IP subnet and both IB NICs are in the same IB subnet) Or am I not understanding something? How do the processes figure out their peer information without a modex receive?

The place in code where the modex receive is called is in btl_add_procs(). However, it looks like in both the above BTLs, this method is never called. Is that expected?

2. This is the real question is this:
I am writing a BTL for a proprietary RDMA NIC (named 'lf' in the code) that has no routing capability in protocol, and hence no concept of subnets. An HCA simply needs to be plugged in to the switch and it can see the whole network. However, there is a VLAN like partition (similar to IB partitions) Given this (and as a first cut, every node is in the same partition, so even this complexity is eliminated), there is not much use for a modex exchange, but I added one anyway just with the partition key.

What I see is that the component open, register and init are all successful, but r2 bml still does not choose this network and thus OMPI aborts because of lack of full reachability.

This is my command line:
sudo /usr/local/bin/mpirun --allow-run-as-root -hostfile ~/hostfile -np 2 -mca btl self,lf -mca btl_base_verbose 100 -mca bml_base_verbose 100 ./mpitest

('mpitest' is a trivial 'hello world' program plus ONE MPI_Send()/MPI_Recv() to test in-band communication. The sudo is required because currently the driver requires root permission; I was told that this will be fixed. The hostfile has 2 hosts, named b-2 and b-3, with back-to-back connection on this 'lf' HCA)

The output of this command is as follows; I have added my comments to explain it a bit.

<Output from OMPI logging mechanism>
[b-2:21062] mca: base: components_register: registering framework bml components
[b-2:21062] mca: base: components_register: found loaded component r2
[b-2:21062] mca: base: components_register: component r2 register function successful
[b-2:21062] mca: base: components_open: opening bml components
[b-2:21062] mca: base: components_open: found loaded component r2
[b-2:21062] mca: base: components_open: component r2 open function successful [b-2:21062] mca: base: components_register: registering framework btl components
[b-2:21062] mca: base: components_register: found loaded component self
[b-2:21062] mca: base: components_register: component self register function successful
[b-2:21062] mca: base: components_register: found loaded component lf
[b-2:21062] mca: base: components_register: component lf register function successful
[b-2:21062] mca: base: components_open: opening btl components
[b-2:21062] mca: base: components_open: found loaded component self
[b-2:21062] mca: base: components_open: component self open function successful
[b-2:21062] mca: base: components_open: found loaded component lf

<Debugging output from the HCA driver>
lf_group_lib.c:442: _lf_open: _lf_open("MPI_0",0x842,0x1b6,4096,0)

<Output from OMPI logging mechanism, continued>
[b-2:21062] mca: base: components_open: component lf open function successful
[b-2:21062] select: initializing btl component self
[b-2:21062] select: init of component self returned success
[b-2:21062] select: initializing btl component lf

<Debugging output from the HCA driver>
Created group on b-2

<Output from OMPI logging mechanism, continued>
[b-2:21062] select: init of component lf returned success
[b-3:07672] mca: base: components_register: registering framework bml components
[b-3:07672] mca: base: components_register: found loaded component r2
[b-3:07672] mca: base: components_register: component r2 register function successful
[b-3:07672] mca: base: components_open: opening bml components
[b-3:07672] mca: base: components_open: found loaded component r2
[b-3:07672] mca: base: components_open: component r2 open function successful [b-3:07672] mca: base: components_register: registering framework btl components
[b-3:07672] mca: base: components_register: found loaded component self
[b-3:07672] mca: base: components_register: component self register function successful
[b-3:07672] mca: base: components_register: found loaded component lf
[b-3:07672] mca: base: components_register: component lf register function successful
[b-3:07672] mca: base: components_open: opening btl components
[b-3:07672] mca: base: components_open: found loaded component self
[b-3:07672] mca: base: components_open: component self open function successful
[b-3:07672] mca: base: components_open: found loaded component lf
[b-3:07672] mca: base: components_open: component lf open function successful
[b-3:07672] select: initializing btl component self
[b-3:07672] select: init of component self returned success
[b-3:07672] select: initializing btl component lf

<Debugging output from the HCA driver>
lf_group_lib.c:442: _lf_open: _lf_open("MPI_0",0x842,0x1b6,4096,0)
Created group on b-3

<Output from OMPI logging mechanism, continued>
[b-3:07672] select: init of component lf returned success
[b-2:21062] mca: bml: Using self btl for send to [[6866,1],0] on node b-2
[b-3:07672] mca: bml: Using self btl for send to [[6866,1],1] on node b-3

<Output from the 'mpitest' MPI program: out-of-band-I/O>
Hello from b-2
The world has 2 nodes
My rank is 0
Hello from b-3

<Output frm OMPI>
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[6866,1],0]) is on host: b-2
  Process 2 ([[6866,1],1]) is on host: 10.4.70.12
  BTLs attempted: self

Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------

<Output from the 'mpitest' MPI program: out-of-band-I/O, continued>
The world has 2 nodes
My rank is 1

<Output from OMPI logging mechanism, continued>
[b-2:21062] *** An error occurred in MPI_Send
[b-2:21062] *** reported by process [140385751007233,21474836480]
[b-2:21062] *** on communicator MPI_COMM_WORLD
[b-2:21062] *** MPI_ERR_INTERN: internal error
[b-2:21062] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[b-2:21062] ***    and potentially your MPI job)
[durga@b-2 ~]$

As you can see, the lf network is not being chosen for communication. Without a modex exchange, how can that happen? Or, in a nutshell, what do I need to do?

Thanks a lot in advance
Durga


1% of the executables have 99% of CPU privilege!
Userspace code! Unite!! Occupy the kernel!!!


_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: 
http://www.open-mpi.org/community/lists/devel/2016/04/18827.php

Reply via email to