Ok - Can you provide more insight? I'm using the same disto, kernel, and Lustre RPMs on all the servers. Why would modules load on one server but not the others? And a more practical point what target do I build? make make install make rpms? Thx
-----Original Message----- From: Nathaniel Rutman [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 06, 2007 4:25 PM To: Snider, Tim Cc: Eric Barton; [email protected] Subject: Re: [Lustre-discuss] [Lustre-devel] Using Infiniband with 1.5.95 This is strictly a compile issue -- Lustre won't work over o2ib until the ko2iblnd module can load successfully. The default header path the o2iblnd uses is $LINUX/drivers/infiniband - you need to make sure Lustre is compiled against the o2ib/OFED headers that your kernel modules actually use. The ./configure flag for Lustre is: --with-o2ib=path build o2iblnd against path HTH Snider, Tim wrote: > Ok - more details. ipoib itself is working on all servers. there are > ipoib ping utilities that run successfully between all the servers in > the fabric. > I was able to successfully mount on the mdt/mgs after installing > Lustre modules by hand using the force option. > Mounting the OST device still fails. ptlrpc refuses to load manually > with the force option. All kernel / lustre versions are identical > between the servers. > > What am I missing? > > uname -a > Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu > Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] > mnt]# modprobe -vf ptlrpc > insmod > /lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko > FATAL: Error inserting ptlrpc > (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko): > Input/output error > /var/log/messages > Feb 6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic, > tainting kernel. > Feb 6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [8/256] > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_create_cq > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_create_cq > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_resolve_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_dereg_mr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_dereg_mr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_reject > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_reject > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_disconnect > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_disconnect > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_route > Feb 6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting > ko2iblnd > (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko ): > Unknown symbol in module, or unknown parameter (see dmesg) > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_resolve_route > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_bind_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol > rdma_bind_addr > Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_qp > <<<similar messages are displayed for awhile same as > before>>> > Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about > version of symbol ib_dealloc_pd > Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol > ib_dealloc_pd > Feb 6 17:03:21 FedoraCore120 kernel: LustreError: > 4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND o2ib, > module ko2iblnd, rc=256 > Feb 6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > Feb 6 17:03:21 FedoraCore120 kernel: LustreError: > 4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation > failed > > > > ---------------------------------------------------------------------- > -- > *From:* [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] *On Behalf Of *Snider, > Tim > *Sent:* Tuesday, February 06, 2007 10:19 AM > *To:* Eric Barton; [email protected] > *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > I can successfully ping other servers thru ib using ipoib ip addresses. > Loading lnet or trying to mount a lustre device using o2ib using OFED > 1.1.1 > modprobe lnet generates complaints about symbol versions of ib related > routines. > What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible > with Lustre 1.5.95? > > Thanks for the advice. > Tim > > /etc/modprobe.conf > alias eth0 tg3 > alias eth1 tg3 > alias scsi_hostadapter mptbase > alias scsi_hostadapter1 mptscsih > alias usb-controller ohci-hcd > options lnet networks=tcp,o2ib # specify both ethernet and ib > networks for Lustre. > alias ib0 ib_ipoib > alias ib1 ib_ipoib > alias net-pf-27 ib_sdp > > Sample of messages: > Feb 6 14:34:21 FedoraCore121 root: =========start lnet and debug > Feb 6 14:34:27 FedoraCore121 kernel: Lustre: > 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 > Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [8/256] > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_create_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_create_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_resolve_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_dereg_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_dereg_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_reject > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_reject > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_disconnect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_disconnect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_resolve_route > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_resolve_route > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_bind_addr > Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting > ko2iblnd > (/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko ): > Unknown symbol in module, or unknown parameter (see dmesg) > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_bind_addr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_create_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_destroy_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_destroy_cq > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_create_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_create_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_listen > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_listen > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_destroy_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_destroy_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_get_dma_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_get_dma_mr > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_alloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_alloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_connect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_connect > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_modify_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_modify_qp > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_destroy_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_destroy_id > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol rdma_accept > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > rdma_accept > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about > version of symbol ib_dealloc_pd > Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol > ib_dealloc_pd > Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > Feb 6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option missing > Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no > mechanism available > > > ---------------------------------------------------------------------- > -- > *From:* Eric Barton [mailto:[EMAIL PROTECTED] > *Sent:* Monday, February 05, 2007 10:42 AM > *To:* Snider, Tim; [email protected] > *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > Is that OFED 1.1? Does /etc/modprobe.conf contain... > > options lnet networks=o2ib > > ...or the equivalent using ip2nets? If this isn't clear, please see > the lustre manual for an explanation of network setup. > > Can you bring up lustre networking on the mgs and a client node... > > modprobe lnet; lctl net up > > ...and then check /proc/sys/lnet/nis? It should list the local NIDs > (e.g.... > > <ipoib IP address>@o2ib > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > ...). If that looks OK, run an lnet ping from the client to the MGS... > > lctl ping [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> > > Please note that by default, network error messages are logged > internally, but are not printed to the console or /var/log/messages, > so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable > verbose network messages while you are debugging connectivity. > > Cheers, > Eric > > ------------------------------------------------------------------------ > *From:* [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] *On Behalf Of > *Snider, Tim > *Sent:* 05 February 2007 2:40 PM > *To:* [email protected] > *Subject:* [Lustre-discuss] [Lustre-devel] Using Infiniband with > 1.5.95 > > We're trying to set up a Lustre configuration using infiniband > ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is > installed. We can successfully ping between the mdt/mgs nad ost > servers using the ipoib address. Lustre fs creation is > "apparently" successfull. Mounting the lustre device fails. > 1. Does 1.5.95 work properly with ipoib? > 2. What is the proper form of mgsnode specification, should > o2ib or openiib be used? > 2.a Should we specify the ipoib address or the adapter/port #? > > The ost command line we're trying is: > mkfs.lustre --fsname=testfs [EMAIL PROTECTED] > <mailto:[EMAIL PROTECTED]> /dev/sdb1 > > Thanks, > Timothy Snider > Storage Architect > Strategic Planning, Technology and Architecture > > LSI Logic Corporation > 3718 North Rock Road > Wichita, KS 67226 > (316) 636-8736 > [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>_ > > > > ---------------------------------------------------------------------- > -- > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > https://mail.clusterfs.com/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
