On Суббота 14 февраля 2009 06:28:44 Vittorio wrote:
> Hello!
> This is my first message on the list so i hope that i'm not going to ask
> silly or already answered question ^^
>
> i'm a student and i'm porting an electromagnetic field simulator to a
> parallel and distributed linux cluster for final thesis; i'm using both
> OpenMP and MPI over Infiniband to achieve speed improvements
>
> the openmp part is done and now i'm facing problem with setting up MPI over
> Infinband
> i have correctly set up the kernel modules
> installed the right drivers for the board (a mellanox hca) and userspace
> programs (os
> installed mpavich2 mpi implementation (thanks to msg [1])
>
> however i fail to run all of this together:
> for example ibhost correctly find the two nodes connected
>
> Ca : 0x0002c90300018b8e ports 2 " HCA-1"
> Ca : 0x0002c90300018b12 ports 2 "localhost HCA-1"
>
> but ibping doens't receive responses
>
> ibwarn: [32052] ibping: Ping..
> ibwarn: [32052] mad_rpc_rmpp: _do_madrpc failed; dport (Lid 2)
> ibwarn: [32052] main: ibping to Lid 2 failed
>
> subsequently any other operation with MPI fails
> strangely enough however IPoIB works very well and i can ping and connect
> with no problems
>
> the two machines are identical and they use a crossover cable (point to
> point)
> 03:00.0 InfiniBand: Mellanox Technologies MT25418 [ConnectX IB DDR, PCIe
> 2.0 2.5GT/s] (rev a0)
>
> what can be the cause of all of this? am i forgetting something?
> any help is greatly appreciated
>
>
> for the mantainers, is it possible to have openib-diags installed in
> /usr/bin instead of /usr/sbin? most of the files recall other programs or
> script only from /usr/bin
> i resolved doing
>
> for x in `ls -l /usr/sbin/ib*|awk '{print $9}'`; do ln -s $x
> /usr/bin/`basename $x`; done
>
> [1]
> http://archives.gentoo.org/gentoo-science/msg_7c030de6ea7ce8673ab90061a066d
>f28.xml
>
>
> thanks a lot
> Vittorio
Hi
First of all you should be shore that opensm was started
Then you could use ib subnet
for mpi trqansport i use openmpi
--
Alexey 'Alexxy' Shvetsov
Gentoo/KDE
Gentoo/MIPS
Gentoo Team Ru
signature.asc
Description: This is a digitally signed message part.
