On 12/03/13 23:27, Jeff Squyres (jsquyres) wrote:
On Nov 22, 2013, at 1:19 PM, Paul Kapinos <[email protected]> wrote:Well, I've tried this path on actual 1.7.3 (where the code is moved some 12 lines - beginning with 2700). !! - no output "skipping device"! Also when starting main processes and -bind-to-socket used. What I see is[cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_1, port 1 [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable device [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: found: device mlx4_0, port 1 [cluster.rz.RWTH-Aachen.DE:43670] btl:usnic: this is not a usnic-capable deviceThat's actually ok -- that's from the usnic BTL, not the openib BTL. The usnic BTL is the Cisco UD verbs component, and it only works with Cisco UCS servers and VICs; it will not work with generic IB cards. Hence, these messages are telling you that the usnic BTL is disqualifying itself because the ibv devices it found are not Cisco UCS VICs.
Argh - what a shame not to see "btl:usnic" :-|
Look for the openib messages, not the usnic messages.
Well, as said there were *no messages* form the patch you provided in http://www.open-mpi.org/community/lists/devel/2013/06/12472.phpI've attached of a run with single process per node on nodes with 2 NICs, maybe you can see what goes wrong..
Best Paul -- Dipl.-Inform. Paul Kapinos - High Performance Computing, RWTH Aachen University, Center for Computing and Communication Seffenter Weg 23, D 52074 Aachen (Germany) Tel: +49 241/80-24915
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register:
registering btl components
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found
loaded component self
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register:
component self register function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found
loaded component sm
--------------------------------------------------------------------------
WARNING: A user-supplied value attempted to override the default-only MCA
variable named "btl_sm_use_knem".
The user-supplied value was ignored.
--------------------------------------------------------------------------
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register:
component sm register function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found
loaded component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register:
component openib register function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register: found
loaded component usnic
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_register:
component usnic register function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: opening btl
components
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found
loaded component self
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component
self open function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found
loaded component sm
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component
sm open function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found
loaded component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component
openib open function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: found
loaded component usnic
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: components_open: component
usnic open function successful
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component self
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component self returned
success
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component sm
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component sm returned
success
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component
openib
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: registering
btl components
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded
component self
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component
self register function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded
component sm
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component sm
register function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded
component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: oob CPC available for use
on mlx4_1:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm IP address not found
on port
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm CPC unavailable for
use on mlx4_1:1; skipped
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component
openib register function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: found loaded
component usnic
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_register: component
usnic register function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: opening btl
components
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded
component self
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component self
open function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded
component sm
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component sm open
function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded
component openib
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component openib
open function successful
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: found loaded
component usnic
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: components_open: component usnic
open function successful
[cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component self
[cluster.rz.RWTH-Aachen.DE:64279] select: init of component self returned
success
[cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component sm
[cluster.rz.RWTH-Aachen.DE:64279] select: init of component sm returned success
[cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: oob CPC available for use
on mlx4_0:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] openib BTL: rdmacm CPC available for
use on mlx4_0:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component openib
returned success
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: initializing btl component usnic
[cluster-linux.rz.RWTH-Aachen.DE:19324] found 2 verbs interfaces
[cluster-linux.rz.RWTH-Aachen.DE:19324] examining verbs interface: mlx4_1
[cluster-linux.rz.RWTH-Aachen.DE:19324] found acceptable verbs interface
mlx4_1:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] examining verbs interface: mlx4_0
[cluster-linux.rz.RWTH-Aachen.DE:19324] found acceptable verbs interface
mlx4_0:1
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: found: device mlx4_1, port 1
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: this is not a usnic-capable
device
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: found: device mlx4_0, port 1
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: this is not a usnic-capable
device
[cluster-linux.rz.RWTH-Aachen.DE:19324] btl:usnic: returning 0 modules
[cluster-linux.rz.RWTH-Aachen.DE:19324] select: init of component usnic
returned failure
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component usnic closed
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component
usnic
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: oob CPC available for use on
mlx4_1:1
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm IP address not found on
port
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm CPC unavailable for use on
mlx4_1:1; skipped
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: oob CPC available for use on
mlx4_0:1
[cluster.rz.RWTH-Aachen.DE:64279] openib BTL: rdmacm CPC available for use on
mlx4_0:1
[cluster.rz.RWTH-Aachen.DE:64279] select: init of component openib returned
success
[cluster.rz.RWTH-Aachen.DE:64279] select: initializing btl component usnic
[cluster.rz.RWTH-Aachen.DE:64279] found 2 verbs interfaces
[cluster.rz.RWTH-Aachen.DE:64279] examining verbs interface: mlx4_1
[cluster.rz.RWTH-Aachen.DE:64279] found acceptable verbs interface mlx4_1:1
[cluster.rz.RWTH-Aachen.DE:64279] examining verbs interface: mlx4_0
[cluster.rz.RWTH-Aachen.DE:64279] found acceptable verbs interface mlx4_0:1
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: found: device mlx4_1, port 1
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: this is not a usnic-capable device
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: found: device mlx4_0, port 1
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: this is not a usnic-capable device
[cluster.rz.RWTH-Aachen.DE:64279] btl:usnic: returning 0 modules
[cluster.rz.RWTH-Aachen.DE:64279] select: init of component usnic returned
failure
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component usnic closed
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component usnic
Prozessor 1 on Host: cluster-linux.rz.RWTH-Aachen.DE
Prozessor 0 on Host: cluster.rz.RWTH-Aachen.DE
0 --> 1 Latenz: 0.009 ms, Bandbreite: 1804.681 Mbyte/s
Fertig-ID 0 on Host: cluster.rz.RWTH-Aachen.DE
1 --> 0 Latenz: 0.149 ms, Bandbreite: 1998.477 Mbyte/s
Fertig-ID 1 on Host: cluster-linux.rz.RWTH-Aachen.DE
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component self closed
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component self
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component sm closed
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component sm
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: component openib closed
[cluster.rz.RWTH-Aachen.DE:64279] mca: base: close: unloading component openib
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component self closed
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component
self
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component sm closed
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component sm
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: component openib
closed
[cluster-linux.rz.RWTH-Aachen.DE:19324] mca: base: close: unloading component
openib
[cluster.rz.RWTH-Aachen.DE:64273] 1 more process has sent help message
help-mca-var.txt / default-only-param-set
[cluster.rz.RWTH-Aachen.DE:64273] Set MCA parameter "orte_base_help_aggregate"
to 0 to see all help / error messages
smime.p7s
Description: S/MIME Cryptographic Signature
