Hi Tim,

On Wed, Feb 07, 2007 at 06:11:39AM -0700, Snider, Tim wrote:
>Ok - Can you provide more insight?  I'm using the same disto, kernel,

I'm using 1.5.97 (beta7) which unlike beta5 has (possibly useless) IB
modules in the RHEL kernel rpms from clusterfs.

can you please try 1.5.97 and let me know how you go?

>and  Lustre RPMs on all the servers. Why would modules load on one
>server but not the others? 
>And a more practical point what target do I build?
>make
>make install
>make rpms?

I was hoping to avoid all that :-/ hence my previous email.

it's not clear to me what's the best order in which to build/install
new OFED and patch the RHEL kernel build tree with Lustre. something to
keep me amused today.

cheers,
robin

>Thx
>
>-----Original Message-----
>From: Nathaniel Rutman [mailto:[EMAIL PROTECTED] 
>Sent: Tuesday, February 06, 2007 4:25 PM
>To: Snider, Tim
>Cc: Eric Barton; [email protected]
>Subject: Re: [Lustre-discuss] [Lustre-devel] Using Infiniband with
>1.5.95
>
>This is strictly a compile issue -- Lustre won't work over o2ib until
>the ko2iblnd module can load successfully.
>The default header path the o2iblnd uses is $LINUX/drivers/infiniband -
>you need to make sure Lustre is compiled against the o2ib/OFED headers
>that your kernel modules actually use.  The ./configure flag for Lustre
>is:
>  --with-o2ib=path        build o2iblnd against path
>HTH
> 
>
>Snider, Tim wrote:
>> Ok - more details. ipoib itself is working on all servers. there are 
>> ipoib ping utilities that run successfully between all the servers in 
>> the fabric.
>> I was able to successfully mount on the mdt/mgs after installing 
>> Lustre modules by hand using the force option.
>> Mounting the OST device still fails. ptlrpc refuses to load manually 
>> with the force option. All kernel / lustre versions are identical 
>> between the servers.
>>  
>> What am I missing?
>>  
>> uname -a
>>         Linux FedoraCore120 2.6.9-42.EL_lustre.1.5.95smp #1 SMP Thu 
>> Sep 28 06:36:13 MDT 2006 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] 
>> mnt]# modprobe -vf ptlrpc
>>         insmod
>> /lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko
>>         FATAL: Error inserting ptlrpc
>>
>(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/fs/lustre/ptlrpc.ko): 
>> Input/output error
>> /var/log/messages
>>     Feb  6 17:03:20 FedoraCore120 kernel: ptlrpc: no version magic, 
>> tainting kernel.
>>     Feb  6 17:03:20 FedoraCore120 kernel: Lustre: Added LNI 
>> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [8/256]
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_create_cq
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol 
>> ib_create_cq
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_resolve_addr
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol 
>> rdma_resolve_addr
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_dereg_mr
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol 
>> ib_dereg_mr
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_reject
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol 
>> rdma_reject
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_disconnect
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol 
>> rdma_disconnect
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_resolve_route
>>     Feb  6 17:03:20 FedoraCore120 modprobe: FATAL: Error inserting 
>> ko2iblnd
>>
>(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko
>):     
>> Unknown symbol in module, or unknown parameter (see dmesg)
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol 
>> rdma_resolve_route
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_bind_addr
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: Unknown symbol 
>> rdma_bind_addr
>>     Feb  6 17:03:20 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_create_qp
>>                 <<<similar messages are displayed for awhile same as
>> before>>>
>>     Feb  6 17:03:21 FedoraCore120 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_dealloc_pd
>>     Feb  6 17:03:21 FedoraCore120 kernel: ko2iblnd: Unknown symbol 
>> ib_dealloc_pd
>>     Feb  6 17:03:21 FedoraCore120 kernel: LustreError: 
>> 4753:0:(api-ni.c:1002:lnet_startup_lndnis()) Can't load LND o2ib, 
>> module ko2iblnd, rc=256
>>     Feb  6 17:03:21 FedoraCore120 kernel: Lustre: Removed LNI 
>> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>>     Feb  6 17:03:21 FedoraCore120 kernel: LustreError: 
>> 4753:0:(events.c:581:ptlrpc_init_portals()) network initialisation 
>> failed
>>  
>>  
>>
>> ----------------------------------------------------------------------
>> --
>> *From:* [EMAIL PROTECTED]
>> [mailto:[EMAIL PROTECTED] *On Behalf Of *Snider, 
>> Tim
>> *Sent:* Tuesday, February 06, 2007 10:19 AM
>> *To:* Eric Barton; [email protected]
>> *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with
>> 1.5.95
>>
>> I can successfully ping other servers thru ib using ipoib ip
>addresses.
>> Loading lnet or trying to mount a lustre device using o2ib using OFED
>> 1.1.1
>> modprobe lnet generates complaints about symbol versions of ib related
>
>> routines.
>> What versions of the OFED driver (1.0, 1.1, or 1.1.1) are compatible 
>> with Lustre 1.5.95?
>>  
>> Thanks for the advice.
>> Tim
>>  
>> /etc/modprobe.conf
>>  alias eth0 tg3
>>  alias eth1 tg3
>>  alias scsi_hostadapter mptbase
>>  alias scsi_hostadapter1 mptscsih
>>  alias usb-controller ohci-hcd
>>  options lnet networks=tcp,o2ib    # specify both ethernet and ib 
>> networks for Lustre.
>>  alias ib0 ib_ipoib
>>  alias ib1 ib_ipoib
>>  alias net-pf-27 ib_sdp
>>
>> Sample of messages:
>>    Feb  6 14:34:21 FedoraCore121 root: =========start lnet and debug
>>    Feb  6 14:34:27 FedoraCore121 kernel: Lustre: 
>> 2306:0:(module.c:382:init_libcfs_module()) maximum lustre stack 8192
>>    Feb  6 14:34:46 FedoraCore121 kernel: Lustre: Added LNI 
>> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> [8/256]
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_create_cq
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> ib_create_cq
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_resolve_addr
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_resolve_addr
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_dereg_mr
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> ib_dereg_mr
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_reject
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_reject
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_disconnect
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_disconnect
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_resolve_route
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_resolve_route
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_bind_addr
>>    Feb  6 14:34:46 FedoraCore121 modprobe: FATAL: Error inserting 
>> ko2iblnd
>>
>(/lib/modules/2.6.9-42.EL_lustre.1.5.95smp/kernel/net/lustre/ko2iblnd.ko
>): 
>> Unknown symbol in module, or unknown parameter (see dmesg)
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_bind_addr
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_create_qp
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_create_qp
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_destroy_cq
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> ib_destroy_cq
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_create_id
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_create_id
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_listen
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_listen
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_destroy_qp
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_destroy_qp
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_get_dma_mr
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> ib_get_dma_mr
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_alloc_pd
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> ib_alloc_pd
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_connect
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_connect
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_modify_qp
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> ib_modify_qp
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_destroy_id
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_destroy_id
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol rdma_accept
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> rdma_accept
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: disagrees about 
>> version of symbol ib_dealloc_pd
>>    Feb  6 14:34:46 FedoraCore121 kernel: ko2iblnd: Unknown symbol 
>> ib_dealloc_pd
>>    Feb  6 14:34:47 FedoraCore121 kernel: Lustre: Removed LNI 
>> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>>    Feb  6 14:35:01 FedoraCore121 sendmail[2268]: sql_select option
>missing
>>    Feb  6 14:35:01 FedoraCore121 sendmail[2268]: auxpropfunc error no 
>> mechanism available
>>  
>>
>> ----------------------------------------------------------------------
>> --
>> *From:* Eric Barton [mailto:[EMAIL PROTECTED]
>> *Sent:* Monday, February 05, 2007 10:42 AM
>> *To:* Snider, Tim; [email protected]
>> *Subject:* RE: [Lustre-discuss] [Lustre-devel] Using Infiniband with
>> 1.5.95
>>
>> Is that OFED 1.1?  Does /etc/modprobe.conf contain...
>>  
>> options lnet networks=o2ib
>>  
>> ...or the equivalent using ip2nets?   If this isn't clear, please see 
>> the lustre manual for an explanation of network setup. 
>>  
>> Can you bring up lustre networking on the mgs and a client node...
>>  
>> modprobe lnet; lctl net up
>>  
>> ...and then check /proc/sys/lnet/nis? It should list the local NIDs 
>> (e.g....
>>  
>> <ipoib IP address>@o2ib
>> [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>>  
>> ...).  If that looks OK, run an lnet ping from the client to the
>MGS...
>>  
>> lctl ping [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
>>  
>> Please note that by default, network error messages are logged 
>> internally, but are not printed to the console or /var/log/messages, 
>> so it may help to "echo + neterror > /proc/sys/lnet/printk" to enable 
>> verbose network messages while you are debugging connectivity.
>>
>>     Cheers,
>>                        Eric
>>
>>
>------------------------------------------------------------------------
>>     *From:* [EMAIL PROTECTED]
>>     [mailto:[EMAIL PROTECTED] *On Behalf Of
>>     *Snider, Tim
>>     *Sent:* 05 February 2007 2:40 PM
>>     *To:* [email protected]
>>     *Subject:* [Lustre-discuss] [Lustre-devel] Using Infiniband with
>>     1.5.95
>>
>>     We're trying to set up a Lustre  configuration using infiniband
>>     ipoib with 1.5.95. openib 1.1 (was formally openib gen 2) is
>>     installed. We can successfully ping between the mdt/mgs nad ost
>>     servers using the ipoib address. Lustre fs creation is
>>     "apparently" successfull. Mounting the lustre device fails.
>>     1.    Does 1.5.95 work properly with ipoib?
>>     2.    What is the proper form of mgsnode specification, should
>>     o2ib or openiib be used?
>>     2.a        Should we specify the ipoib address or the adapter/port
>#?
>>      
>>     The ost command line we're trying is:
>>          mkfs.lustre --fsname=testfs [EMAIL PROTECTED]
>>     <mailto:[EMAIL PROTECTED]> /dev/sdb1
>>      
>>     Thanks,
>>     Timothy Snider
>>     Storage Architect
>>     Strategic Planning, Technology and Architecture
>>
>>     LSI Logic Corporation
>>     3718 North Rock Road
>>     Wichita, KS 67226
>>     (316) 636-8736
>>     [EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>_
>>
>>      
>>
>> ----------------------------------------------------------------------
>> --
>>
>> _______________________________________________
>> Lustre-discuss mailing list
>> [email protected]
>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>>   
>
>_______________________________________________
>Lustre-discuss mailing list
>[email protected]
>https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
>

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to