On Fri, Mar 7, 2008 at 10:39 AM, Charles Taylor <[EMAIL PROTECTED]> wrote: > Make sure the client can lctl ping the MDS and OSS o2ib nids.
I'm not sure what the output should look like, but the IPoIB addresses of the MDS and OSS nodes are: 36.122.255.20[1234], and the ping output from the client looks like: # lctl ping [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] # lctl ping [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] # lctl ping [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] # lctl ping [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] > Then > make sure of the same between the OSSs and the MDS/MGS. Looks the same from the MDS/OSS's: # pdsh -w io[1-4] "lctl ping [EMAIL PROTECTED];lctl ping [EMAIL PROTECTED];lctl ping [EMAIL PROTECTED];lctl ping [EMAIL PROTECTED]" | dshbak -c ---------------- io[1-4] ---------------- [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] > If all that > seems fine, I would start to wonder if I made a mistake in specifying > the nids when formating the OSTs. The MDS formatting looked like: mkfs.lustre --mdt --mgs --mgsnode="[EMAIL PROTECTED],[EMAIL PROTECTED]" \ --fsname=ddnlfs --param sys.timeout=40 --param lov.stripesize=2M \ --stripe-count-hint=8 /dev/md0 The OST's formatting looked like: for i in a b c d do mkfs.lustre --ost --mgsnode="[EMAIL PROTECTED],[EMAIL PROTECTED]" \ --fsname=ddnlfs --param sys.timeout=40 --param lov.stripesize=2M \ --reformat /dev/sd"$i" & done So far, my benchmark results look like everybody is using IB... I do worry if I'll be able to mount the file system on a Ethernet-only system (I don't have one yet... I'll try to test with an IB-capable client, but that could easily generate a false positive). Thanks!@ Chris > > ct > > > > > On Mar 7, 2008, at 12:17 PM, Canon, Richard Shane wrote: > > > > > Chris, > > > > Perhaps you need to perform some write_conf like command. I'm not > > sure if this is needed in 1.6 or not. > > > > Shane > > > > > > > > ----- Original Message ----- > > From: [EMAIL PROTECTED] <lustre-discuss- > > [EMAIL PROTECTED]> > > To: lustre-discuss <[email protected]> > > Sent: Fri Mar 07 12:03:17 2008 > > Subject: Re: [Lustre-discuss] Multihomed question: want Lustre over > > IB andEthernet > > > > On Fri, Mar 7, 2008 at 9:39 AM, Craig Prescott > > <[EMAIL PROTECTED]> wrote: > >> > >> I think your client modprobe.conf lnet option > >> should be this: > >> > >> > >> options lnet networks=o2ib(ib0) > >> > >> (not 'o2ib0'). > > > > It still seems to want the TCP connection: > > > > Lustre: Added LNI [EMAIL PROTECTED] [8/64] > > Lustre: Lustre Client File System; [EMAIL PROTECTED] > > LustreError: 11043:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID found > > for [EMAIL PROTECTED] > > LustreError: 11043:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot > > find peer [EMAIL PROTECTED] > > LustreError: 11043:0:(ldlm_lib.c:312:client_obd_setup()) can't add > > initial connection > > LustreError: 11043:0:(obd_config.c:325:class_setup()) setup > > ddnlfs-MDT0000-mdc-0000010430934400 failed (-2) > > LustreError: 11043:0:(obd_config.c:1062:class_config_llog_handler()) > > Err -2 on cfg command: > > LustreError: 11141:0:(connection.c:142:ptlrpc_put_connection()) > > NULL connection > > Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID > > 2:[EMAIL PROTECTED] > > LustreError: 15c-8: [EMAIL PROTECTED]: The configuration from log > > 'ddnlfs-client' failed (-2). This may be the result of communication > > errors between this node and the MGS, a bad configuration, or other > > errors. See the syslog for more information. > > LustreError: 11043:0:(llite_lib.c:1021:ll_fill_super()) Unable to > > process log: -2 > > LustreError: 11043:0:(obd_config.c:392:class_cleanup()) Device 2 > > not setup > > Lustre: client 0000010430934400 umount complete > > LustreError: 11043:0:(obd_mount.c:1924:lustre_fill_super()) Unable to > > mount (-2) > > > >> > >> Another thing to try, if that doesn't work lctl > >> ping your MDS/MGS/OSS nids, like so: > >> > >> lctl ping [EMAIL PROTECTED] > > > > Before and after the change it looks the same: > > > > # lctl ping [EMAIL PROTECTED] > > [EMAIL PROTECTED] > > [EMAIL PROTECTED] > > [EMAIL PROTECTED] > > > > If I change my modprobe.conf to look as on the MDS/OSS's: > > > > options lnet networks=o2ib0(ib0),tcp0(eth0) > > > > Then, mount just specifying o2ib: > > > > # mount -t lustre [EMAIL PROTECTED]:/ddnlfs /lfs > > > > It works, but, both ko2iblnd and ksocklnd are loaded. > > > > The dmesg output is: > > > > Lustre: OBD class driver, [EMAIL PROTECTED] > > Lustre Version: 1.6.4.2 > > Build Version: > > 1.6.4.2-19691231190000-PRISTINE-.usr.src.linux-2.6.9-67.0.4.EL- > > Lustre-1.6.4.2 > > Lustre: Added LNI [EMAIL PROTECTED] [8/64] > > Lustre: Added LNI [EMAIL PROTECTED] [8/256] > > Lustre: Accept secure, port 988 > > Lustre: Lustre Client File System; [EMAIL PROTECTED] > > Lustre: ddnlfs-clilov-000001042f8b7c00.lov: set parameter > > stripesize=2M > > Lustre: Client ddnlfs-client has started > > > > Can I be certain it'll use IB for LFS on this client? > > > > Thanks, > > > > Chris > >> > >> Cheers, > >> Craig > >> > >> > >> > >> > >> Chris Worley wrote: > >>> More issues. Now, on the clients. > >>> > >>> The MDT/MGS/OST's are all up and mounted, showing: > >>> > >>> # lctl list_nids > >>> [EMAIL PROTECTED] > >>> [EMAIL PROTECTED] > >>> > >>> Now, when I go to mount on the IB-based clients, I get: > >>> > >>> # mount -t lustre [EMAIL PROTECTED]:/ddnlfs /lfs > >>> mount.lustre: mount [EMAIL PROTECTED]:/ddnlfs at /lfs failed: No > >>> such file or directory > >>> Is the MGS specification correct? > >>> Is the filesystem name correct? > >>> If upgrading, is the copied client log valid? (see upgrade docs) > >>> > >>> The modprobe.conf contains: > >>> > >>> options lnet networks=o2ib0(ib0) > >>> > >>> And lctl looks good: > >>> > >>> # lctl list_nids > >>> [EMAIL PROTECTED] > >>> > >>> But dmesg shows that it wants to go over the 36.121.x.x (tcp) > >>> network > >>> (36.12[12].255.201 is the MGS/MDS server): > >>> > >>> LustreError: 10001:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID > >>> found > >>> for [EMAIL PROTECTED] > >>> LustreError: 10001:0:(client.c:58:ptlrpc_uuid_to_connection()) > >>> cannot > >>> find peer [EMAIL PROTECTED] > >>> LustreError: 10001:0:(ldlm_lib.c:312:client_obd_setup()) can't add > >>> initial connection > >>> LustreError: 9836:0:(connection.c:142:ptlrpc_put_connection()) > >>> NULL connection > >>> LustreError: 10001:0:(obd_config.c:325:class_setup()) setup > >>> ddnlfs-MDT0000-mdc-0000010430913c00 failed (-2) > >>> LustreError: 10001:0:(obd_config.c:1062:class_config_llog_handler()) > >>> Err -2 on cfg command: > >>> Lustre: cmd=cf003 0:ddnlfs-MDT0000-mdc 1:ddnlfs-MDT0000_UUID > >>> 2:[EMAIL PROTECTED] > >>> LustreError: 15c-8: [EMAIL PROTECTED]: The configuration > >>> from log > >>> 'ddnlfs-client' failed (-2). This may be the result of communication > >>> errors between this node and the MGS, a bad configuration, or other > >>> errors. See the syslog for more information. > >>> LustreError: 10001:0:(llite_lib.c:1021:ll_fill_super()) Unable to > >>> process log: -2 > >>> LustreError: 10001:0:(obd_config.c:392:class_cleanup()) Device 2 > >>> not setup > >>> Lustre: client 0000010430913c00 umount complete > >>> LustreError: 10001:0:(obd_mount.c:1924:lustre_fill_super()) > >>> Unable to > >>> mount (-2) > >>> > >>> Note that this setup works fine in the non-multihomed setup, so I > >>> don't think ko2iblnd is to blame (the setup on the clients hasn't > >>> changed at all). > >>> > >>> What am I doing wrong? > >>> > >>> Thanks, > >>> > >>> Chris > >>> On Fri, Mar 7, 2008 at 7:41 AM, Chris Worley <[EMAIL PROTECTED]> > >>> wrote: > >>>> I changed my modprobe.conf to look exactly as yours, and it > >>>> worked. I > >>>> hadn't been using all the quotes until the doc said to... but > >>>> they may > >>>> have indeed been the problem. > >>>> > >>>> Thanks! > >>>> > >>>> Chris > >>>> > >>>> On Fri, Mar 7, 2008 at 3:40 AM, Charles Taylor > >>>> <[EMAIL PROTECTED]> wrote: > >>>>> > >>>>> > >>>>> Do "lclt list_nids" on your mds and oss's. They should look > >>>>> something like this. > >>>>> > >>>>> [EMAIL PROTECTED] ~]# lctl list_nids > >>>>> [EMAIL PROTECTED] > >>>>> [EMAIL PROTECTED] > >>>>> > >>>>> Then your clients should have a nid on one or the other. > >>>>> > >>>>> Check your dmesg output after loading lnet. The complaints are > >>>>> pretty useful. Your modprobe.conf line looks correct although we > >>>>> found we did not need all the quoting so you should check that as > >>>>> well. Ours looks like... > >>>>> > >>>>> options lnet networks=o2ib(ib0),tcp(eth0) > >>>>> > >>>>> My guess is that it either cannot find or does not like your > >>>>> ko2iblnd > >>>>> module. > >>>>> > >>>>> ct > >>>>> > >>>>> > >>>>> > >>>>> On Mar 7, 2008, at 12:46 AM, Chris Worley wrote: > >>>>> > >>>>>> Most everything is over IB, but I have a few systems I'd like > >>>>>> to mount > >>>>>> the Lustre fs over GigE. > >>>>>> > >>>>>> I think I've followed the Multihomed instructions correctly, in: > >>>>>> > >>>>>> http://dlc.sun.com/pdf/820-3681/820-3681.pdf > >>>>>> > >>>>>> My /etc/modprobe.conf on mds/mgs/oss servers (which all have both > >>>>>> Ethernet and IB) includes: > >>>>>> > >>>>>> options lnet 'networks="tcp0(eth0),o2ib0(ib0)"' > >>>>>> > >>>>>> I make and mount the mdt with (which has both IB and Ethernet, > >>>>>> subnet > >>>>>> 36.122.x.x is IB, 36.121.x.x is Ethernet): > >>>>>> > >>>>>> # mkfs.lustre --mdt --mgs > >>>>>> --mgsnode="[EMAIL PROTECTED],[EMAIL PROTECTED]" <... > / > >>>>>> dev/md0 > >>>>>> # mount -t lustre /dev/md0 /lfs/mdtb > >>>>>> > >>>>>> But, at this point, the ksocklnd module is loaded rather than the > >>>>>> ko2iblnd module! > >>>>>> > >>>>>> On the OSS, I make the fs w/ the same "msgnode", but, when I > >>>>>> try to > >>>>>> mount it, it correctly uses the IB interface, but can't > >>>>>> contact the > >>>>>> MDS: > >>>>>> > >>>>>> LustreError: 27520:0:(events.c:401:ptlrpc_uuid_to_peer()) No > >>>>>> NID found > >>>>>> for [EMAIL PROTECTED] > >>>>>> LustreError: 27520:0:(client.c:58:ptlrpc_uuid_to_connection()) > >>>>>> cannot > >>>>>> find peer [EMAIL PROTECTED] > >>>>>> LustreError: 27520:0:(ldlm_lib.c:312:client_obd_setup()) can't > >>>>>> add > >>>>>> initial connection > >>>>>> LustreError: 17126:0:(connection.c:142:ptlrpc_put_connection()) > >>>>>> NULL connection > >>>>>> LustreError: 27520:0:(obd_config.c:325:class_setup()) setup > >>>>>> [EMAIL PROTECTED] failed (-2) > >>>>>> LustreError: 27520:0:(obd_mount.c:454:lustre_start_simple()) > >>>>>> [EMAIL PROTECTED] setup error -2 > >>>>>> LustreError: 27520:0:(obd_mount.c:1368:server_put_super()) no obd > >>>>>> ddnlfs-OSTffff > >>>>>> LustreError: 27520:0:(obd_mount.c:119:server_deregister_mount()) > >>>>>> ddnlfs-OSTffff not registered > >>>>>> > >>>>>> It too has loaded the ksocklnd module, and not the ko2iblnd > >>>>>> module. I > >>>>>> guess that both modules should be loaded in a multihomed case? > >>>>>> > >>>>>> What am I doing wrong? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Chris > >>>>>> _______________________________________________ > >>>>>> Lustre-discuss mailing list > >>>>>> [email protected] > >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >>>>> > >>>>> > >>>> > >>> _______________________________________________ > >>> Lustre-discuss mailing list > >>> [email protected] > >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss > >> > >> > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
