Robin Humble wrote:
Hi Tim,

On Wed, Feb 07, 2007 at 06:11:39AM -0700, Snider, Tim wrote:
Ok - Can you provide more insight?  I'm using the same disto, kernel,

I'm using 1.5.97 (beta7) which unlike beta5 has (possibly useless) IB
modules in the RHEL kernel rpms from clusterfs.

can you please try 1.5.97 and let me know how you go?

Ah yes, I think the latest RHEL kernels now include IB, in which case we should be compiling and distributing the matching ib lnd -- eeb / scjody do you know more about this?
I'm using the same disto, kernel, and  Lustre RPMs on all the servers. Why 
would modules load on one
server but not the others?
They wouldn't. If the kernels are identical, the Lustre modules will either load everywhere or get symbol conflicts everywhere. You could probably "make rpm" in the kernel source directory from a working kernel and install it on your non-working nodes.

And a more practical point what target do I build?
make
make install
make rpms?
./configure --with-o2ib=/path/to/ib/headers
make install
should do it.

I was hoping to avoid all that :-/ hence my previous email.

it's not clear to me what's the best order in which to build/install
new OFED and patch the RHEL kernel build tree with Lustre. something to
keep me amused today.

I would lustre-patch first, build/install ofed, build the kernel, and then build lustre. [Disclaimer: I've never actually done this myself. :( But maybe eeb or scjody can add something here.]
cheers,
robin

Thx

-----Original Message-----
From: Nathaniel Rutman [mailto:[EMAIL PROTECTED] Sent: Tuesday, February 06, 2007 4:25 PM
To: Snider, Tim
Cc: Eric Barton; [email protected]
Subject: Re: [Lustre-discuss] [Lustre-devel] Using Infiniband with
1.5.95

This is strictly a compile issue -- Lustre won't work over o2ib until
the ko2iblnd module can load successfully.
The default header path the o2iblnd uses is $LINUX/drivers/infiniband -
you need to make sure Lustre is compiled against the o2ib/OFED headers
that your kernel modules actually use.  The ./configure flag for Lustre
is:
 --with-o2ib=path        build o2iblnd against path
HTH


Snider, Tim wrote:
Ok - more details. ipoib itself is working on all servers. there are ipoib ping utilities that run successfully between all the servers in the fabric. I was able to successfully mount on the mdt/mgs after installing Lustre modules by hand using the force option. Mounting the OST device still fails. ptlrpc refuses to load manually with the force option. All kernel / lustre versions are identical between the servers. What am I missing? uname -a Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] mnt]# modprobe -vf ptlrpc
        insmod
/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko
        FATAL: Error inserting ptlrpc

(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko):
Input/output error
/var/log/messages
Feb 6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic, tainting kernel. Feb 6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [8/256] Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol ib_create_cq Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol ib_create_cq Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_addr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_resolve_addr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol ib_dereg_mr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol ib_dereg_mr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_reject Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_reject Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_disconnect Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_disconnect Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_route Feb 6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting ko2iblnd

(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko
):
Unknown symbol in module, or unknown parameter (see dmesg)
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_resolve_route Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_bind_addr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol rdma_bind_addr Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol rdma_create_qp
                <<<similar messages are displayed for awhile same as
before>>>
Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about version of symbol ib_dealloc_pd Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol ib_dealloc_pd Feb 6 17:03:21 FedoraCore120 kernel: LustreError: 4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND o2ib, module ko2iblnd, rc=256 Feb 6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> Feb 6 17:03:21 FedoraCore120 kernel: LustreError: 4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation failed
----------------------------------------------------------------------
--
*From:* [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] *On Behalf Of *Snider, Tim
*Sent:* Tuesday, February 06, 2007 10:19 AM
*To:* Eric Barton; [email protected]
*Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with
1.5.95

I can successfully ping other servers thru ib using ipoib ip
addresses.
Loading lnet or trying to mount a lustre device using o2ib using OFED
1.1.1
modprobe lnet generates complaints about symbol versions of ib related
routines. What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible with Lustre 1.5.95? Thanks for the advice.
Tim
/etc/modprobe.conf
 alias eth0 tg3
 alias eth1 tg3
 alias scsi_hostadapter mptbase
 alias scsi_hostadapter1 mptscsih
 alias usb-controller ohci-hcd
options lnet networks=tcp,o2ib # specify both ethernet and ib networks for Lustre.
 alias ib0 ib_ipoib
 alias ib1 ib_ipoib
 alias net-pf-27 ib_sdp

Sample of messages:
   Feb  6 14:34:21 FedoraCore121 root: =========start lnet and debug
Feb 6 14:34:27 FedoraCore121 kernel: Lustre: 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192 Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [8/256] Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_create_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_create_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_resolve_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_dereg_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_dereg_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_reject Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_reject Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_disconnect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_disconnect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_resolve_route Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_resolve_route Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_bind_addr Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting ko2iblnd

(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko
):
Unknown symbol in module, or unknown parameter (see dmesg)
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_bind_addr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_create_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_create_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_destroy_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_destroy_cq Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_create_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_create_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_listen Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_listen Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_destroy_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_get_dma_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_get_dma_mr Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_alloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_alloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_connect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_connect Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_modify_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_modify_qp Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_destroy_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_destroy_id Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol rdma_accept Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol rdma_accept Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about version of symbol ib_dealloc_pd Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol ib_dealloc_pd Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
   Feb  6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option
missing
Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no mechanism available
----------------------------------------------------------------------
--
*From:* Eric Barton [mailto:[EMAIL PROTECTED]
*Sent:* Monday, February 05, 2007 10:42 AM
*To:* Snider, Tim; [email protected]
*Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with
1.5.95

Is that OFED 1.1?  Does /etc/modprobe.conf contain...
options lnet networks=o2ib ...or the equivalent using ip2nets? If this isn't clear, please see the lustre manual for an explanation of network setup. Can you bring up lustre networking on the mgs and a client node... modprobe lnet; lctl net up ...and then check /proc/sys/lnet/nis? It should list the local NIDs (e.g.... <ipoib IP address>@o2ib
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
...). If that looks OK, run an lnet ping from the client to the
MGS...
lctl ping [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> Please note that by default, network error messages are logged internally, but are not printed to the console or /var/log/messages, so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable verbose network messages while you are debugging connectivity.

    Cheers,
                       Eric


------------------------------------------------------------------------
    *From:* [EMAIL PROTECTED]
    [mailto:[EMAIL PROTECTED] *On Behalf Of
    *Snider, Tim
    *Sent:* 05 February 2007 2:40 PM
    *To:* [email protected]
    *Subject:* [Lustre-discuss] [Lustre-devel] Using Infiniband with
    1.5.95

    We're trying to set up a Lustre  configuration using infiniband
    ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is
    installed. We can successfully ping between the mdt/mgs nad ost
    servers using the ipoib address. Lustre fs creation is
    "apparently" successfull. Mounting the lustre device fails.
    1.    Does 1.5.95 work properly with ipoib?
    2.    What is the proper form of mgsnode specification, should
    o2ib or openiib be used?
    2.a        Should we specify the ipoib address or the adapter/port
#?
The ost command line we're trying is:
         mkfs.lustre --fsname=testfs [EMAIL PROTECTED]
    <mailto:[EMAIL PROTECTED]> /dev/sdb1
Thanks,
    Timothy Snider
    Storage Architect
    Strategic Planning, Technology and Architecture

    LSI Logic Corporation
    3718 North Rock Road
    Wichita, KS 67226
    (316) 636-8736
    [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>_

----------------------------------------------------------------------
--

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss



_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to