Make sure the client can lctl ping the MDS and OSS o2ib nids. Then make sure of the same between the OSSs and the MDS/MGS. If all that seems fine, I would start to wonder if I made a mistake in specifying the nids when formating the OSTs.
ct On Mar 7, 2008, at 12:17 PM, Canon, Richard Shane wrote: > > Chris, > > Perhaps you need to perform some write_conf like command. I'm not > sure if this is needed in 1.6 or not. > > Shane > > > > ----- Original Message ----- > From: [EMAIL PROTECTED] <lustre-discuss- > [EMAIL PROTECTED]> > To: lustre-discuss <[email protected]> > Sent: Fri Mar 07 12:03:17 2008 > Subject: Re: [Lustre-discuss] Multihomed question: want Lustre over > IB andEthernet > > On Fri, Mar 7, 2008 at 9:39 AM, Craig Prescott > <[EMAIL PROTECTED]> wrote: >> >> I think your client modprobe.conf lnet option >> should be this: >> >> >> options lnet networks=o2ib(ib0) >> >> (not 'o2ib0'). > > It still seems to want the TCP connection: > > Lustre: Added LNI [EMAIL PROTECTED] [8/64] > Lustre: Lustre Client File System; [EMAIL PROTECTED] > LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found > for [EMAIL PROTECTED] > LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot > find peer [EMAIL PROTECTED] > LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can't add > initial connection > LustreError: 11043:0:(obd_config.c:325:class_setup()) setup > ddnlfs-MDT0000-mdc-0000010430934400 failed (-2) > LustreError: 11043:0:(obd_config.c:1062:class_config_llog_handler()) > Err -2 on cfg command: > LustreError: 11141:0:(connection.c:142:ptlrpc_put_connection()) > NULL connection > Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID > 2:[EMAIL PROTECTED] > LustreError: 15c-8: [EMAIL PROTECTED]: The configuration from log > 'ddnlfs-client' failed (-2). This may be the result of communication > errors between this node and the MGS, a bad configuration, or other > errors. See the syslog for more information. > LustreError: 11043:0:(llite_lib.c:1021:ll_fill_super()) Unable to > process log: -2 > LustreError: 11043:0:(obd_config.c:392:class_cleanup()) Device 2 > not setup > Lustre: client 0000010430934400 umount complete > LustreError: 11043:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > mount (-2) > >> >> Another thing to try, if that doesn't work lctl >> ping your MDS/MGS/OSS nids, like so: >> >> lctl ping [EMAIL PROTECTED] > > Before and after the change it looks the same: > > # lctl ping [EMAIL PROTECTED] > [EMAIL PROTECTED] > [EMAIL PROTECTED] > [EMAIL PROTECTED] > > If I change my modprobe.conf to look as on the MDS/OSS's: > > options lnet networks=o2ib0(ib0),tcp0(eth0) > > Then, mount just specifying o2ib: > > # mount -t lustre [EMAIL PROTECTED]:/ddnlfs /lfs > > It works, but, both ko2iblnd and ksocklnd are loaded. > > The dmesg output is: > > Lustre: OBD class driver, [EMAIL PROTECTED] > Lustre Version: 1.6.4.2 > Build Version: > 1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL- > Lustre-1.6.4.2 > Lustre: Added LNI [EMAIL PROTECTED] [8/64] > Lustre: Added LNI [EMAIL PROTECTED] [8/256] > Lustre: Accept secure, port 988 > Lustre: Lustre Client File System; [EMAIL PROTECTED] > Lustre: ddnlfs-clilov-000001042f8b7c00.lov: set parameter > stripesize=2M > Lustre: Client ddnlfs-client has started > > Can I be certain it'll use IB for LFS on this client? > > Thanks, > > Chris >> >> Cheers, >> Craig >> >> >> >> >> Chris Worley wrote: >>> More issues. Now, on the clients. >>> >>> The MDT/MGS/OST's are all up and mounted, showing: >>> >>> # lctl list_nids >>> [EMAIL PROTECTED] >>> [EMAIL PROTECTED] >>> >>> Now, when I go to mount on the IB-based clients, I get: >>> >>> # mount -t lustre [EMAIL PROTECTED]:/ddnlfs /lfs >>> mount.lustre: mount [EMAIL PROTECTED]:/ddnlfs at /lfs failed: No >>> such file or directory >>> Is the MGS specification correct? >>> Is the filesystem name correct? >>> If upgrading, is the copied client log valid? (see upgrade docs) >>> >>> The modprobe.conf contains: >>> >>> options lnet networks=o2ib0(ib0) >>> >>> And lctl looks good: >>> >>> # lctl list_nids >>> [EMAIL PROTECTED] >>> >>> But dmesg shows that it wants to go over the 36.121.x.x (tcp) >>> network >>> (36.12[12].255.201 is the MGS/MDS server): >>> >>> LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID >>> found >>> for [EMAIL PROTECTED] >>> LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) >>> cannot >>> find peer [EMAIL PROTECTED] >>> LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add >>> initial connection >>> LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) >>> NULL connection >>> LustreError: 10001:0:(obd_config.c:325:class_setup()) setup >>> ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2) >>> LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler()) >>> Err -2 on cfg command: >>> Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID >>> 2:[EMAIL PROTECTED] >>> LustreError: 15c-8: [EMAIL PROTECTED]: The configuration >>> from log >>> 'ddnlfs-client' failed (-2). This may be the result of communication >>> errors between this node and the MGS, a bad configuration, or other >>> errors. See the syslog for more information. >>> LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to >>> process log: -2 >>> LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 >>> not setup >>> Lustre: client 0000010430913c00 umount complete >>> LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) >>> Unable to >>> mount (-2) >>> >>> Note that this setup works fine in the non-multihomed setup, so I >>> don't think ko2iblnd is to blame (the setup on the clients hasn't >>> changed at all). >>> >>> What am I doing wrong? >>> >>> Thanks, >>> >>> Chris >>> On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <[EMAIL PROTECTED]> >>> wrote: >>>> I changed my modprobe.conf to look exactly as yours, and it >>>> worked. I >>>> hadn't been using all the quotes until the doc said to... but >>>> they may >>>> have indeed been the problem. >>>> >>>> Thanks! >>>> >>>> Chris >>>> >>>> On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor >>>> <[EMAIL PROTECTED]> wrote: >>>>> >>>>> >>>>> Do "lclt list_nids" on your mds and oss's. They should look >>>>> something like this. >>>>> >>>>> [EMAIL PROTECTED] ~]# lctl list_nids >>>>> [EMAIL PROTECTED] >>>>> [EMAIL PROTECTED] >>>>> >>>>> Then your clients should have a nid on one or the other. >>>>> >>>>> Check your dmesg output after loading lnet. The complaints are >>>>> pretty useful. Your modprobe.conf line looks correct although we >>>>> found we did not need all the quoting so you should check that as >>>>> well. Ours looks like... >>>>> >>>>> options lnet networks=o2ib(ib0),tcp(eth0) >>>>> >>>>> My guess is that it either cannot find or does not like your >>>>> ko2iblnd >>>>> module. >>>>> >>>>> ct >>>>> >>>>> >>>>> >>>>> On Mar 7, 2008, at 12:46 AM, Chris Worley wrote: >>>>> >>>>>> Most everything is over IB, but I have a few systems I'd like >>>>>> to mount >>>>>> the Lustre fs over GigE. >>>>>> >>>>>> I think I've followed the Multihomed instructions correctly, in: >>>>>> >>>>>> http://dlc.sun.com/pdf/820-3681/820-3681.pdf >>>>>> >>>>>> My /etc/modprobe.conf on mds/mgs/oss servers (which all have both >>>>>> Ethernet and IB) includes: >>>>>> >>>>>> options lnet 'networks="tcp0(eth0),o2ib0(ib0)"' >>>>>> >>>>>> I make and mount the mdt with (which has both IB and Ethernet, >>>>>> subnet >>>>>> 36.122.x.x is IB, 36.121.x.x is Ethernet): >>>>>> >>>>>> # mkfs.lustre --mdt --mgs >>>>>> --mgsnode="[EMAIL PROTECTED],[EMAIL PROTECTED]" <... > / >>>>>> dev/md0 >>>>>> # mount -t lustre /dev/md0 /lfs/mdtb >>>>>> >>>>>> But, at this point, the ksocklnd module is loaded rather than the >>>>>> ko2iblnd module! >>>>>> >>>>>> On the OSS, I make the fs w/ the same "msgnode", but, when I >>>>>> try to >>>>>> mount it, it correctly uses the IB interface, but can't >>>>>> contact the >>>>>> MDS: >>>>>> >>>>>> LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No >>>>>> NID found >>>>>> for [EMAIL PROTECTED] >>>>>> LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) >>>>>> cannot >>>>>> find peer [EMAIL PROTECTED] >>>>>> LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't >>>>>> add >>>>>> initial connection >>>>>> LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection()) >>>>>> NULL connection >>>>>> LustreError: 27520:0:(obd_config.c:325:class_setup()) setup >>>>>> [EMAIL PROTECTED] failed (-2) >>>>>> LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple()) >>>>>> [EMAIL PROTECTED] setup error -2 >>>>>> LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd >>>>>> ddnlfs-OSTffff >>>>>> LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount()) >>>>>> ddnlfs-OSTffff not registered >>>>>> >>>>>> It too has loaded the ksocklnd module, and not the ko2iblnd >>>>>> module. I >>>>>> guess that both modules should be loaded in a multihomed case? >>>>>> >>>>>> What am I doing wrong? >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Chris >>>>>> _______________________________________________ >>>>>> Lustre-discuss mailing list >>>>>> [email protected] >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>> >>>>> >>>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
