I have a pair of x86/linux (32 bit) hosts connected by Mellanox Tavor HCAs.
 I have no idea if (or why) this has only appeared on this system, but I
find that blt:openib thinks the INI file says to ignore these HCAs.  See
the 4th line below:


[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_ip.c:364:add_rdma_addr]
Adding addr 172.18.0.105 (0x690012ac) subnet 0xac120000 as mthca0:1
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_ini.c:170:ompi_btl_openib_ini_query]
Querying INI files for vendor 0x02c9, part ID 23108
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_ini.c:189:ompi_btl_openib_ini_query]
Found corresponding INI values: Mellanox Tavor Infinihost
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_component.c:1541:init_one_device]
device mthca0 skipped; ignore_device=1
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_component.c:988:device_destruct]
Failed to release mpool
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/btl_openib_component.c:1020:device_destruct]
Failed to destroy device resources
[pcp-j-5][[27705,1],0][/home/pcp1/phargrov/OMPI/openmpi-1.8.2rc3-linux-x86-mx/openmpi-1.8.2rc3/ompi/mca/btl/openib/connect/btl_openib_connect_rdmacm.c:1981:rdmacm_component_finalize]
rdmacm_component_finalize

Turns out this is known, and has been entered as trac ticket #4377,
currently assigned to miked.
Applying the 2-line patch attached to the ticket fixes the ignore_device=1
problem for me.

Mike,
Please apply that patch to trunk and CMR for 1.8.2

BTW:
Even with the "ignore_device=1" problem fixed, I can't get btl:openib
running on x86.
So, there may be additional reports in the next few hours.

-Paul

-- 
Paul H. Hargrove                          phhargr...@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

Reply via email to