As joe requested, here's before reloading the openIB stuff
[r...@h2o01 ~]# lsmod | grep ib_
ib_srp 38281 0
ib_sdp 54785 0
rdma_cm 39381 2 ib_sdp,rdma_ucm
ib_addr 11081 1 rdma_cm
ib_mthca 158357 0
ib_ipoib 96673 0
ib_umad 20969 0
ib_ucm 20937 0
ib_uverbs 43377 2 rdma_ucm,ib_ucm
ib_cm 42217 4 ib_srp,rdma_cm,ib_ipoib,ib_ucm
ib_sa 48841 4 ib_srp,rdma_cm,ib_ipoib,ib_cm
ib_mad 43497 4 ib_mthca,ib_umad,ib_cm,ib_sa
ib_core 69825 13
ib_srp,ib_sdp,rdma_ucm,rdma_cm,iw_cm,ib_mthca,ib_ipoib,ib_umad,ib_ucm,ib_uverbs,ib_cm,ib_sa,ib_mad
ipv6 285729 29 ib_ipoib
scsi_mod 145425 3 ib_srp,libata,sd_mod
[r...@h2o01 ~]# /etc/init.d/openibd restart
Unloading OpenIB kernel modules: [ OK ]
Loading OpenIB kernel modules: [ OK ]
[r...@h2o01 ~]#
[r...@h2o01 ~]# lsmod | grep ib_
ib_srp 38281 0
ib_sdp 54785 0
ib_ipoib 96673 0
rdma_cm 39381 2 ib_sdp,rdma_ucm
ib_addr 11081 1 rdma_cm
ib_mthca 158357 0
ib_umad 20969 0
ib_ucm 20937 0
ib_uverbs 43377 2 rdma_ucm,ib_ucm
ib_cm 42217 4 ib_srp,ib_ipoib,rdma_cm,ib_ucm
ib_sa 48841 4 ib_srp,ib_ipoib,rdma_cm,ib_cm
ib_mad 43497 4 ib_mthca,ib_umad,ib_cm,ib_sa
ib_core 69825 13
ib_srp,ib_sdp,ib_ipoib,rdma_ucm,rdma_cm,iw_cm,ib_mthca,ib_umad,ib_ucm,ib_uverbs,ib_cm,ib_sa,ib_mad
ipv6 285729 29 ib_ipoib
scsi_mod 145425 3 ib_srp,libata,sd_mod
[r...@h2o01 ~]# ifconfig ib0 up
[r...@h2o01 ~]# ifconfig ib0
ib0 Link encap:UNSPEC HWaddr
80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:192.168.2.1 Bcast:192.168.2.255 Mask:255.255.255.0
UP BROADCAST MULTICAST MTU:2044 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
As you can see that the ib0 interface does come up and routing seems to
be setup properly
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt
Iface
192.168.2.0 * 255.255.255.0 U 0 0 0 ib0
10.84.4.0 * 255.255.255.0 U 0 0 0
eth0
192.168.1.0 * 255.255.255.0 U 0 0 0
eth1
169.254.0.0 * 255.255.0.0 U 0 0 0 ib0
224.0.0.0 * 240.0.0.0 U 0 0 0
eth1
default 10.84.4.1 0.0.0.0 UG 0 0
But if i ping a node, i get nothing:
[r...@h2o01 ~]# ping h2oi05.cluster
PING h2oi05.cluster (192.168.2.5) 56(84) bytes of data.
From h2oi01.cluster (192.168.2.1) icmp_seq=0 Destination Host Unreachable
From h2oi01.cluster (192.168.2.1) icmp_seq=1 Destination Host Unreachable
From h2oi01.cluster (192.168.2.1) icmp_seq=2 Destination Host Unreachable
From h2oi01.cluster (192.168.2.1) icmp_seq=4 Destination Host Unreachable
From h2oi01.cluster (192.168.2.1) icmp_seq=5 Destination Host Unreachable
From h2oi01.cluster (192.168.2.1) icmp_seq=6 Destination Host Unreachable
--- h2oi05.cluster ping statistics ---
8 packets transmitted, 0 received, +6 errors, 100% packet loss, time 7000ms
, pipe 4
I did ping myself and i get :
[r...@h2o01 ~]# ping h2oi01.cluster
PING h2oi01.cluster (192.168.2.1) 56(84) bytes of data.
64 bytes from h2oi01.cluster (192.168.2.1): icmp_seq=0 ttl=64 time=0.018 ms
64 bytes from h2oi01.cluster (192.168.2.1): icmp_seq=1 ttl=64 time=0.010 ms
64 bytes from h2oi01.cluster (192.168.2.1): icmp_seq=2 ttl=64 time=0.011 ms
64 bytes from h2oi01.cluster (192.168.2.1): icmp_seq=3 ttl=64 time=0.011 ms
64 bytes from h2oi01.cluster (192.168.2.1): icmp_seq=4 ttl=64 time=0.015 ms
64 bytes from h2oi01.cluster (192.168.2.1): icmp_seq=5 ttl=64 time=0.008 ms
--- h2oi01.cluster ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 4999ms
rtt min/avg/max/mdev = 0.008/0.012/0.018/0.004 ms, pipe 2
It appears that the Ip stack over IB is up and installed, just not
talking on the wire or passing thru the switch.
jeff
Joe Landman wrote:
jeffrey Lang wrote:
First let me say, I hope this is the right list for this email, if not
please forgive me.
I have a small 16 node compute cluster. The university where I work
at recently opened a new Datacenter. My cluster was moved from the old
Datacenter. Before the move the inifiniband was working properly,
after the move the ipoib has stopped working.
[...]
I've reset the sm on the switch, but nothing seems to work.
Any ideas of where to look for whats causing the problem?
Could you do an
lsmod | grep ib_
I assume you did an
/etc/init.d/openibd restart
If not, now is a good time ... then rerun the lsmod above.
If you don't see ib_ipoib, then you might try this
ifconfig ib0 up
then send the output of
lsmod | grep ib_
ifconfig ib0
If these still don't work, try
modprobe ib_ipoib
ifconfig ib0 up
ifconfig ib0
begin:vcard
fn:Jeffrey Lang
n:Lang;Jeffrey
org:University of Wyoming;Geology and Geophyscis
adr:;;1000 E. University Ave;Laramie;WY;72071;USA
email;internet:[email protected]
title:Unix/Linux Systems Admiin
tel;work:307-766-3381
x-mozilla-html:TRUE
url:http://home.gg.uwyo.edu
version:2.1
end:vcard
_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general