Thx
-----Original Message-----
From: Nathaniel Rutman [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 06, 2007 4:25 PM
To: Snider, Tim
Cc: Eric Barton; [email protected]
Subject: Re: [Lustre-discuss] [Lustre-devel] Using Infiniband with
1.5.95
This is strictly a compile issue -- Lustre won't work over o2ib until
the ko2iblnd module can load successfully.
The default header path the o2iblnd uses is $LINUX/drivers/infiniband -
you need to make sure Lustre is compiled against the o2ib/OFED headers
that your kernel modules actually use. The ./configure flag for Lustre
is:
--with-o2ib=path build o2iblnd against path
HTH
Snider, Tim wrote:
Ok - more details. ipoib itself is working on all servers. there are
ipoib ping utilities that run successfully between all the servers in
the fabric.
I was able to successfully mount on the mdt/mgs after installing
Lustre modules by hand using the force option.
Mounting the OST device still fails. ptlrpc refuses to load manually
with the force option. All kernel / lustre versions are identical
between the servers.
What am I missing?
uname -a
Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu
Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux [EMAIL PROTECTED]
mnt]# modprobe -vf ptlrpc
insmod
/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko
FATAL: Error inserting ptlrpc
(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko):
Input/output error
/var/log/messages
Feb 6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic,
tainting kernel.
Feb 6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [8/256]
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol ib_create_cq
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol
ib_create_cq
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol rdma_resolve_addr
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol
rdma_resolve_addr
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol ib_dereg_mr
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol
ib_dereg_mr
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol rdma_reject
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol
rdma_reject
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol rdma_disconnect
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol
rdma_disconnect
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol rdma_resolve_route
Feb 6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting
ko2iblnd
(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko
):
Unknown symbol in module, or unknown parameter (see dmesg)
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol
rdma_resolve_route
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol rdma_bind_addr
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol
rdma_bind_addr
Feb 6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol rdma_create_qp
<<<similar messages are displayed for awhile same as
before>>>
Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about
version of symbol ib_dealloc_pd
Feb 6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol
ib_dealloc_pd
Feb 6 17:03:21 FedoraCore120 kernel: LustreError:
4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND o2ib,
module ko2iblnd, rc=256
Feb 6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Feb 6 17:03:21 FedoraCore120 kernel: LustreError:
4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation
failed
----------------------------------------------------------------------
--
*From:* [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] *On Behalf Of *Snider,
Tim
*Sent:* Tuesday, February 06, 2007 10:19 AM
*To:* Eric Barton; [email protected]
*Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with
1.5.95
I can successfully ping other servers thru ib using ipoib ip
addresses.
Loading lnet or trying to mount a lustre device using o2ib using OFED
1.1.1
modprobe lnet generates complaints about symbol versions of ib related
routines.
What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible
with Lustre 1.5.95?
Thanks for the advice.
Tim
/etc/modprobe.conf
alias eth0 tg3
alias eth1 tg3
alias scsi_hostadapter mptbase
alias scsi_hostadapter1 mptscsih
alias usb-controller ohci-hcd
options lnet networks=tcp,o2ib # specify both ethernet and ib
networks for Lustre.
alias ib0 ib_ipoib
alias ib1 ib_ipoib
alias net-pf-27 ib_sdp
Sample of messages:
Feb 6 14:34:21 FedoraCore121 root: =========start lnet and debug
Feb 6 14:34:27 FedoraCore121 kernel: Lustre:
2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192
Feb 6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [8/256]
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol ib_create_cq
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
ib_create_cq
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_resolve_addr
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_resolve_addr
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol ib_dereg_mr
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
ib_dereg_mr
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_reject
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_reject
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_disconnect
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_disconnect
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_resolve_route
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_resolve_route
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_bind_addr
Feb 6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting
ko2iblnd
(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko
):
Unknown symbol in module, or unknown parameter (see dmesg)
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_bind_addr
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_create_qp
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_create_qp
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol ib_destroy_cq
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
ib_destroy_cq
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_create_id
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_create_id
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_listen
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_listen
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_destroy_qp
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_destroy_qp
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol ib_get_dma_mr
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
ib_get_dma_mr
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol ib_alloc_pd
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
ib_alloc_pd
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_connect
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_connect
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol ib_modify_qp
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
ib_modify_qp
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_destroy_id
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_destroy_id
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol rdma_accept
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
rdma_accept
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about
version of symbol ib_dealloc_pd
Feb 6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol
ib_dealloc_pd
Feb 6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Feb 6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option
missing
Feb 6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no
mechanism available
----------------------------------------------------------------------
--
*From:* Eric Barton [mailto:[EMAIL PROTECTED]
*Sent:* Monday, February 05, 2007 10:42 AM
*To:* Snider, Tim; [email protected]
*Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with
1.5.95
Is that OFED 1.1? Does /etc/modprobe.conf contain...
options lnet networks=o2ib
...or the equivalent using ip2nets? If this isn't clear, please see
the lustre manual for an explanation of network setup.
Can you bring up lustre networking on the mgs and a client node...
modprobe lnet; lctl net up
...and then check /proc/sys/lnet/nis? It should list the local NIDs
(e.g....
<ipoib IP address>@o2ib
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
...). If that looks OK, run an lnet ping from the client to the
MGS...
lctl ping [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
Please note that by default, network error messages are logged
internally, but are not printed to the console or /var/log/messages,
so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable
verbose network messages while you are debugging connectivity.
Cheers,
Eric
------------------------------------------------------------------------
*From:* [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] *On Behalf Of
*Snider, Tim
*Sent:* 05 February 2007 2:40 PM
*To:* [email protected]
*Subject:* [Lustre-discuss] [Lustre-devel] Using Infiniband with
1.5.95
We're trying to set up a Lustre configuration using infiniband
ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is
installed. We can successfully ping between the mdt/mgs nad ost
servers using the ipoib address. Lustre fs creation is
"apparently" successfull. Mounting the lustre device fails.
1. Does 1.5.95 work properly with ipoib?
2. What is the proper form of mgsnode specification, should
o2ib or openiib be used?
2.a Should we specify the ipoib address or the adapter/port
#?
The ost command line we're trying is:
mkfs.lustre --fsname=testfs [EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]> /dev/sdb1
Thanks,
Timothy Snider
Storage Architect
Strategic Planning, Technology and Architecture
LSI Logic Corporation
3718 North Rock Road
Wichita, KS 67226
(316) 636-8736
[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>_
----------------------------------------------------------------------
--
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss