Hello!
This is my first message on the list so i hope that i'm not going to ask
silly or already answered question ^^

i'm a student and i'm porting an electromagnetic field simulator to a
parallel and distributed linux cluster for final thesis; i'm using both
OpenMP and MPI over Infiniband to achieve speed improvements

the openmp part is done and now i'm facing problem with setting up MPI over
Infinband
i have correctly set up the kernel modules
installed the right drivers for the board (a mellanox hca) and userspace
programs (os
installed mpavich2 mpi implementation (thanks to msg [1])

however i fail to run all of this together:
for example ibhost correctly find the two nodes connected

Ca    : 0x0002c90300018b8e ports 2 " HCA-1"
Ca    : 0x0002c90300018b12 ports 2 "localhost HCA-1"

but ibping doens't receive responses

ibwarn: [32052] ibping: Ping..
ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
ibwarn: [32052] main: ibping to Lid 2 failed

subsequently any other operation with MPI fails
strangely enough however IPoIB works very well and i can ping and connect
with no problems

the two machines are identical and they use a crossover cable (point to
point)
03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe 2.0
2.5GT/s] (rev a0)

what can be the cause of all of this? am i forgetting something?
any help is greatly appreciated


for the mantainers, is it possible to have openib-diags installed in
/usr/bin instead of /usr/sbin? most of the files recall other programs or
script only from /usr/bin
i resolved doing

for x in `ls -l /usr/sbin/ib*|awk '{print $9}'`; do ln -s $x
/usr/bin/`basename $x`; done

[1]
http://archives.gentoo.org/gentoo-science/msg_7c030de6ea7ce8673ab90061a066df28.xml


thanks a lot
Vittorio

Reply via email to