Subbu, I'd suggest: 1) make sure ko2iblnd has been brought up (please check if there is any error message when startup ko2iblnd) 2) echo +neterror > /proc/sys/lnet/printk, then try with lctl ping, if it still can't work please post error messages
Regards Liang subbu kl: > Problem is similer to > http://lists.lustre.org/pipermail/lustre-discuss/2008-May/007498.html > But by looking at the thread could not really get the solution for the > problem. > > I have two RHEL5 Linux servers installed with following packages - > > kernel-lustre-smp-2.6.18-53.1.14.el5_lustre.1.6.5.1 > kernel-ib-1.3-2.6.18_53.1.14.el5_lustre.1.6.5.1smp > lustre-ldiskfs-3.0.4-2.6.18_53.1.14.el5_lustre.1.6.5.1smp > lustre-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp > lustre-modules-1.6.5.1-2.6.18_53.1.14.el5_lustre.1.6.5.1smp > e2fsprogs-1.40.7.sun3-0redhat > > > machine 1: with ib0 IP address : 172.24.198.111 > machine 2: with ib0 IP address : 172.24.198.112 > > /etc/modprobe.conf contains > options lnet networks=o2ib > > TCP networking worked fine and now I am trying with Infiniband network > finding it difficult in communicating with IB nodes mounting effort > throghs me the following error > > [r...@p186 ~]# mount -t lustre -o loop /tmp/lustre-ost1 /mnt/ost1 > mount.lustre: mount /dev/loop0 at /mnt/ost1 failed: Input/output error > Is the MGS running? > > /var/log/messages : > Jan 15 16:55:25 p186 kernel: kjournald starting. Commit interval 5 > seconds > Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal > Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Jan 15 16:55:25 p186 kernel: kjournald starting. Commit interval 5 > seconds > Jan 15 16:55:25 p186 kernel: LDISKFS FS on loop0, internal journal > Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mounted filesystem with > ordered data mode. > Jan 15 16:55:25 p186 kernel: LDISKFS-fs: file extents enabled > Jan 15 16:55:25 p186 kernel: LDISKFS-fs: mballoc enabled > Jan 15 16:55:30 p186 kernel: Lustre: Request x7 sent from > mgc172.24.198....@o2ib to NID 172.24.198....@o2ib 5s ago has timed out > (limit 5s). > Jan 15 16:55:30 p186 kernel: LustreError: > 7193:0:(obd_mount.c:1062:server_start_targets()) Required registration > failed for lustre-OSTffff: -5 > Jan 15 16:55:30 p186 kernel: LustreError: 15f-b: Communication error > with the MGS. Is the MGS running? > Jan 15 16:55:30 p186 kernel: LustreError: > 7193:0:(obd_mount.c:1597:server_fill_super()) Unable to start targets: -5 > Jan 15 16:55:30 p186 kernel: LustreError: > 7193:0:(obd_mount.c:1382:server_put_super()) no obd lustre-OSTffff > Jan 15 16:55:30 p186 kernel: LustreError: > 7193:0:(obd_mount.c:119:server_deregister_mount()) lustre-OSTffff not > registered > Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 blocks 0 reqs (0 > success) > Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 extents scanned, 0 > goal hits, 0 2^N hits, 0 breaks, 0 lost > Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 generated and it > took 0 > Jan 15 16:55:30 p186 kernel: LDISKFS-fs: mballoc: 0 preallocated, 0 > discarded > Jan 15 16:55:30 p186 kernel: Lustre: server umount lustre-OSTffff complete > Jan 15 16:55:30 p186 kernel: LustreError: > 7193:0:(obd_mount.c:1951:lustre_fill_super()) Unable to mount (-5) > > All pinging efforts also failed to the IB NIDS local/remote > can ping the ip address : > [r...@p186 ~]# ping 172.24.198.112 > PING 172.24.198.112 (172.24.198.112) 56(84) bytes of data. > 64 bytes from 172.24.198.112 <http://172.24.198.112>: icmp_seq=1 > ttl=64 time=0.052 ms > 64 bytes from 172.24.198.112 <http://172.24.198.112>: icmp_seq=2 > ttl=64 time=0.024 ms > > --- 172.24.198.112 ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 1000ms > rtt min/avg/max/mdev = 0.024/0.038/0.052/0.014 ms > [r...@p186 ~]# ping 172.24.198.111 > PING 172.24.198.111 (172.24.198.111) 56(84) bytes of data. > 64 bytes from 172.24.198.111 <http://172.24.198.111>: icmp_seq=1 > ttl=64 time=2.16 ms > 64 bytes from 172.24.198.111 <http://172.24.198.111>: icmp_seq=2 > ttl=64 time=0.296 ms > > --- 172.24.198.111 ping statistics --- > 2 packets transmitted, 2 received, 0% packet loss, time 1000ms > rtt min/avg/max/mdev = 0.296/1.231/2.166/0.935 ms > > but cant ping the NIDS : > [r...@p186 ~]# lctl ping 172.24.198....@o2ib > failed to ping 172.24.198....@o2ib: Input/output error > [r...@p186 ~]# lctl ping 172.24.198....@o2ib > failed to ping 172.24.198....@o2ib: Input/output error > > Any idea why lnet cant ping NIDS ? > > some more configurations: > [r...@p186 ~]# ibstat > CA 'mthca0' > CA type: MT23108 > Number of ports: 2 > Firmware version: 3.5.0 > Hardware version: a1 > Node GUID: 0x0002c9020021550c > > Machines are connected via IB switch. > > Looking forward for help. > > ~subbu > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss