Hello all
I am struggling with this issue for last few days and thought it would
be prudent to ask for help from people who have way more experience
than I do.
There are two questions, interrelated in my mind, but may not be so in
reality. Question 2 is the issue I am struggling with, and question 1
sort of leads to it.
1. I see that both in openib and tcp BTL (the two kind of hardware I
have access to) a modex send happens, but a matching modex receive
never happens. Is it because of some kind of optimization? (In my
case, both IP NICs are in the same IP subnet and both IB NICs are in
the same IB subnet) Or am I not understanding something? How do the
processes figure out their peer information without a modex receive?
The place in code where the modex receive is called is in
btl_add_procs(). However, it looks like in both the above BTLs, this
method is never called. Is that expected?
2. This is the real question is this:
I am writing a BTL for a proprietary RDMA NIC (named 'lf' in the code)
that has no routing capability in protocol, and hence no concept of
subnets. An HCA simply needs to be plugged in to the switch and it can
see the whole network. However, there is a VLAN like partition
(similar to IB partitions)
Given this (and as a first cut, every node is in the same partition,
so even this complexity is eliminated), there is not much use for a
modex exchange, but I added one anyway just with the partition key.
What I see is that the component open, register and init are all
successful, but r2 bml still does not choose this network and thus
OMPI aborts because of lack of full reachability.
This is my command line:
sudo /usr/local/bin/mpirun --allow-run-as-root -hostfile ~/hostfile
-np 2 -mca btl self,lf -mca btl_base_verbose 100 -mca bml_base_verbose
100 ./mpitest
('mpitest' is a trivial 'hello world' program plus ONE
MPI_Send()/MPI_Recv() to test in-band communication. The sudo is
required because currently the driver requires root permission; I was
told that this will be fixed. The hostfile has 2 hosts, named b-2 and
b-3, with back-to-back connection on this 'lf' HCA)
The output of this command is as follows; I have added my comments to
explain it a bit.
<Output from OMPI logging mechanism>
[b-2:21062] mca: base: components_register: registering framework bml
components
[b-2:21062] mca: base: components_register: found loaded component r2
[b-2:21062] mca: base: components_register: component r2 register
function successful
[b-2:21062] mca: base: components_open: opening bml components
[b-2:21062] mca: base: components_open: found loaded component r2
[b-2:21062] mca: base: components_open: component r2 open function
successful
[b-2:21062] mca: base: components_register: registering framework btl
components
[b-2:21062] mca: base: components_register: found loaded component self
[b-2:21062] mca: base: components_register: component self register
function successful
[b-2:21062] mca: base: components_register: found loaded component lf
[b-2:21062] mca: base: components_register: component lf register
function successful
[b-2:21062] mca: base: components_open: opening btl components
[b-2:21062] mca: base: components_open: found loaded component self
[b-2:21062] mca: base: components_open: component self open function
successful
[b-2:21062] mca: base: components_open: found loaded component lf
<Debugging output from the HCA driver>
lf_group_lib.c:442: _lf_open: _lf_open("MPI_0",0x842,0x1b6,4096,0)
<Output from OMPI logging mechanism, continued>
[b-2:21062] mca: base: components_open: component lf open function
successful
[b-2:21062] select: initializing btl component self
[b-2:21062] select: init of component self returned success
[b-2:21062] select: initializing btl component lf
<Debugging output from the HCA driver>
Created group on b-2
<Output from OMPI logging mechanism, continued>
[b-2:21062] select: init of component lf returned success
[b-3:07672] mca: base: components_register: registering framework bml
components
[b-3:07672] mca: base: components_register: found loaded component r2
[b-3:07672] mca: base: components_register: component r2 register
function successful
[b-3:07672] mca: base: components_open: opening bml components
[b-3:07672] mca: base: components_open: found loaded component r2
[b-3:07672] mca: base: components_open: component r2 open function
successful
[b-3:07672] mca: base: components_register: registering framework btl
components
[b-3:07672] mca: base: components_register: found loaded component self
[b-3:07672] mca: base: components_register: component self register
function successful
[b-3:07672] mca: base: components_register: found loaded component lf
[b-3:07672] mca: base: components_register: component lf register
function successful
[b-3:07672] mca: base: components_open: opening btl components
[b-3:07672] mca: base: components_open: found loaded component self
[b-3:07672] mca: base: components_open: component self open function
successful
[b-3:07672] mca: base: components_open: found loaded component lf
[b-3:07672] mca: base: components_open: component lf open function
successful
[b-3:07672] select: initializing btl component self
[b-3:07672] select: init of component self returned success
[b-3:07672] select: initializing btl component lf
<Debugging output from the HCA driver>
lf_group_lib.c:442: _lf_open: _lf_open("MPI_0",0x842,0x1b6,4096,0)
Created group on b-3
<Output from OMPI logging mechanism, continued>
[b-3:07672] select: init of component lf returned success
[b-2:21062] mca: bml: Using self btl for send to [[6866,1],0] on node b-2
[b-3:07672] mca: bml: Using self btl for send to [[6866,1],1] on node b-3
<Output from the 'mpitest' MPI program: out-of-band-I/O>
Hello from b-2
The world has 2 nodes
My rank is 0
Hello from b-3
<Output frm OMPI>
--------------------------------------------------------------------------
At least one pair of MPI processes are unable to reach each other for
MPI communications. This means that no Open MPI device has indicated
that it can be used to communicate between these processes. This is
an error; Open MPI requires that all MPI processes be able to reach
each other. This error can sometimes be the result of forgetting to
specify the "self" BTL.
Process 1 ([[6866,1],0]) is on host: b-2
Process 2 ([[6866,1],1]) is on host: 10.4.70.12
BTLs attempted: self
Your MPI job is now going to abort; sorry.
--------------------------------------------------------------------------
<Output from the 'mpitest' MPI program: out-of-band-I/O, continued>
The world has 2 nodes
My rank is 1
<Output from OMPI logging mechanism, continued>
[b-2:21062] *** An error occurred in MPI_Send
[b-2:21062] *** reported by process [140385751007233,21474836480]
[b-2:21062] *** on communicator MPI_COMM_WORLD
[b-2:21062] *** MPI_ERR_INTERN: internal error
[b-2:21062] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,
[b-2:21062] *** and potentially your MPI job)
[durga@b-2 ~]$
As you can see, the lf network is not being chosen for communication.
Without a modex exchange, how can that happen? Or, in a nutshell, what
do I need to do?
Thanks a lot in advance
Durga
1% of the executables have 99% of CPU privilege!
Userspace code! Unite!! Occupy the kernel!!!
_______________________________________________
devel mailing list
de...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/04/18827.php