On Tue, Apr 22, 2008 at 2:21 PM, Cliff White <[EMAIL PROTECTED]> wrote: > Chris Worley wrote: > > > Does anybody have any clues, or do I need to rebuild the entire FS from > scratch? > > > > First, what is in your client modprobe.conf? Should only be 'tcp' for > tcp-only clients. It is/was:
options lnet networks=tcp0(eth0) ... and this worked fine before I added the new OSS. > Second, I don't think you can use an ipoib address as a tcp connection. > If it's ipoib, LNET is going to use o2ib. I don't quite follow. The specific client doesn't have IB. The IPoIB addresses in the network are 36.102.x.x. The Ethernet addresses in the network are: 36.101.x.x. Both are 16 bit class masks. The only place I use IPoIB addresses are in the file system creation on the OSSes, as in: for i in b c d e f g h i j k l; do mkfs.lustre --ost --mgsnode="[EMAIL PROTECTED],[EMAIL PROTECTED]" --fsname=lfs --param sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done ... and that has worked well, up until I added another OSS. Did I do something wrong? The only thing I know I did wrong was, when I first mounted the created file systems, I had my new OSS'es modprobe.conf set for IB only: options lnet networks=o2ib(ib0) I changed that to be the same as my existing OSSes: options lnet networks=o2ib0(ib0),tcp0(eth0) ...after I realized my Ethernet-only clients weren't working, and reloaded everything from scratch (at this point, I have unmounted all clients, unmounted all luster OST/MDT file systems on the servers, removed all Lustre modules from all clients and servers, rebooted the Ethernet client, then remounted all the file systems everywhere... but still no joy on the Ethernet-only clients). At this point I'm guessing that when I made the file systems on the new OSS, even though I had properly set: --mgsnode="[EMAIL PROTECTED],[EMAIL PROTECTED]" ...in the mkfs, the incorrectly set modprobe.conf screwed this mkfs up irrevocably, and since the file system has been in use from IB clients after adding the new OSS, my only recourse is to 1) backup the file system, and 2) rebuild everything (all OSTs and the MDT) from scratch (mkfs) on all OSS'es and the MDS. Is that correct? Thanks, Chris > > cliffw > > > > > > > > > > > > On Mon, Apr 21, 2008 at 9:31 PM, Chris Worley <[EMAIL PROTECTED]> wrote: > > > > > On Mon, Apr 21, 2008 at 9:22 PM, Chris Worley <[EMAIL PROTECTED]> wrote: > > > > The only configuration error on my OSS was: I initially only had > > > > "o2ib0(ib0)" in my modprobe.conf. After unmounting all the OSTs, > and > > > > getting the modprobe.conf right: > > > > > > > > options lnet networks=o2ib0(ib0),tcp0(eth0) > > > > > > > > ...and remounting from scratch, both ksocklnd and ko2iblnd are now > > > > loaded properly. > > > > > > > > But, I still can't mount the partition on the ethernet-only client > nodes. > > > > > > > > They get the error: > > > > > > > > LustreError: 8439:0:(events.c:401:ptlrpc_uuid_to_peer()) No NID > found > > > > for [EMAIL PROTECTED] > > > > LustreError: 8439:0:(client.c:58:ptlrpc_uuid_to_connection()) cannot > > > > find peer [EMAIL PROTECTED] > > > > LustreError: 8439:0:(ldlm_lib.c:312:client_obd_setup()) can't add > > > > initial connection > > > > LustreError: 8439:0:(obd_config.c:325:class_setup()) setup > > > > lfs-OST0026-osc-0000010753919000 failed (-2) > > > > LustreError: 8439:0:(obd_config.c:1062:class_config_llog_handler()) > > > > Err -2 on cfg command: > > > > Lustre: cmd=cf003 0:lfs-OST0026-osc 1:lfs-OST0026_UUID > 2:[EMAIL PROTECTED] > > > > LustreError: 15c-8: [EMAIL PROTECTED]: The configuration from log > > > > 'lfs-client' failed (-2). > > > > > > > > The 36.102.29.4 is the IPoIB address of the added OSS. It shouldn't > > > > want it "@o2ib". > > > > > > > > I've also unmounted all Lustre mounts on the MGS/MDS, unloaded all > the > > > > modules and remounted. Still no joy. > > > > > > > > > > From this point forward, every time I say"OST" I mean "OSS"... > > > > > > > > > > > > > The file systems were created on the new OST, just as on all the > others: > > > > > > > > for i in b c d e f g h i j k l; do mkfs.lustre --ost > > > > --mgsnode="[EMAIL PROTECTED],[EMAIL PROTECTED]" --fsname=lfs --param > > > > sys.timeout=40 --param lov.stripesize=2M /dev/sd$i & done > > > > > > > > The client has the right modprobe.conf, which worked before the > additional OST: > > > > > > > > options lnet networks=tcp0(eth0) > > > > > > > > ... and I'm using the same mount command that worked previously: > > > > > > > > mount -t lustre [EMAIL PROTECTED]:/lfs /lfs > > > > > > > > From the OST I can ping the client: > > > > > > > > # lctl list_nids > > > > [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > # lctl ping [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > > > > > From the client, I can ping the OST and MDS/MGS: > > > > > > > > # lctl list_nids > > > > [EMAIL PROTECTED] > > > > # lctl ping [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > # lctl ping [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > [EMAIL PROTECTED] > > > > > > > > So, somehow, not having the right modprobe.conf the first time I > > > > mounted the partitions on the new OST has made it permanently not > want > > > > to mount properly on Ethernet clients (it mounts fine on IB > clients). > > > > > > > > Any ideas? > > > > > > > > Thanks, > > > > > > > > Chris > > > > > > > > > > > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
