I see the same issue on my Mellanox OFED 1.3. IBCM module is loaded but is no such device in system.
Jeff, looks like some bug in IBCM stuff... (not ompi)
I think we should print the error only if ibcm was explicitly selected by user. But from the cpc level it is no way to know
about explicit selection....Maybe just hide the print ?

Bogdan Costescu wrote:
On Thu, 3 Jul 2008, Pavel Shamis (Pasha) wrote:

I had similar issue recently. It will be nice to have option to disable/enable *CM via config flags.

Not sure if this is related... I am looking at using a nightly 1.3 snapshot and I get this type of error messages when running:

[n020205][[36506,1],0][connect/btl_openib_connect_ibcm.c:723:ibcm_component_query] failed to open IB CM device: /dev/infiniband/ucm0

which is actually right, as /dev/infiniband on the nodes doesn't contain ucm0. On the same cluster, OpenMPI 1.2.7rc2 works fine; the configure options for building them are the same.

The output of ldd shows for the binary linked with 1.3a:

libibcm.so.1 => /opt/ofed/1.2.5.4/lib64/libibcm.so.1

while this is missing from the binary linked with 1.2. Even after printing these messages, the binary linked with 1.3a works; it works even when I specify "--mca btl openib,self" so I think that the IB stack is still being used (there is also a TCP/GigE network which could be chosen otherwise).

I don't know whether this is caused by a somehow inconsistent setup of the system, but I would welcome an option to make 1.3a behave like 1.2.


Reply via email to