Dear Andreas, I wonder if there is any further advice you can kindly offer as to how to troubleshoot the failure in bringing up lustre module?
Many thanks. Peter -----Original Message----- From: Chiu, Peter (STFC,RAL,RALSP) Sent: 11 May 2011 11:50 To: Andreas Dilger Cc: <lustre-discuss@lists.lustre.org>; Chiu, Peter (STFC,RAL,RALSP) Subject: RE: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working Understood, Andreas, Just to supplement is that the same approach works for SLES 11 using a xen kernel (2.6.27.54-0.2-xen). The Lustre Client rpms works okay: cmip-proc1:~ # cat /etc/issue Welcome to SUSE Linux Enterprise Server 11 (x86_64) - Kernel \r (\l). cmip-proc1:~ # uname -a Linux cmip-proc1 2.6.27.54-0.2-xen #1 SMP 2010-10-19 18:40:07 +0200 x86_64 x86_64 x86_64 GNU/Linux cmip-proc1:~ # df -h /disks/ceda1 Filesystem Size Used Avail Use% Mounted on 130.246.191.64:130.246.191.65@tcp0:/ceda1 51T 130G 48T 1% /disks/ceda1 SLES 11 SP1 is a service pack update to SLES 11 (now on 2.6.32.29-0.3-xen). Is it possible to find out what the problem is? Regards, Peter -----Original Message----- From: Andreas Dilger [mailto:adil...@whamcloud.com] Sent: 11 May 2011 10:11 To: Chiu, Peter (STFC,RAL,RALSP) Cc: <lustre-discuss@lists.lustre.org>; Chiu, Peter (STFC,RAL,RALSP) Subject: Re: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working The only other potential problem I see is that you are using a xen kernel and this us somehow causing problems. Cheers, Andreas On 2011-05-11, at 1:33 AM, <peter.c...@stfc.ac.uk> wrote: > Dear Andreas, > > Many thanks for your response. > > Below are further details on this. > > I shall be grateful for your advice on this. > > Regards, > > Peter > ==================================================================================================== > > The system is: > > cmip-proc8:/etc # uname -a > Linux cmip-proc8.badc.rl.ac.uk 2.6.32.29-0.3-xen #1 SMP 2011-02-25 13:36:59 > +0100 x86_64 x86_64 x86_64 GNU/Linux > > /usr/src/linux is a symlink pointing to the source corresponding to > linux-2.6.32.29-0.3-obj: > > cmip-proc8:/etc # ls -l /usr/src > total 24 > drwxr-xr-x 3 root root 4096 2011-05-09 08:31 debug > lrwxrwxrwx 1 root root 19 2011-03-20 15:54 linux -> linux-2.6.32.29-0.3 > drwxr-xr-x 25 root root 4096 2011-05-09 08:49 linux-2.6.32.29-0.3 > drwxr-xr-x 3 root root 4096 2011-03-20 15:54 linux-2.6.32.29-0.3-obj > drwxr-xr-x 3 root root 4096 2011-03-20 15:54 linux-obj > drwxr-xr-x 10 root root 4096 2011-05-09 08:31 lustre-1.8.5 > drwxr-xr-x 7 root root 4096 2011-03-20 14:58 packages > cmip-proc8:/etc # > > cmip-proc8:~ # ls /usr/local/kits/lustre-1.8.5 > > aclocal.m4 config.h.in install-sh Makefile > autoMakefile config.log ldiskfs Makefile.in > autoMakefile.am config.status libsysio missing > autoMakefile.in config.sub lnet mkinstalldirs > build configure lustre README > ChangeLog configure.ac lustre-1.8.5.tar.gz Rules > compile COPYING lustre-iokit snmp > config.guess debian lustre.spec stamp-h1 > config.h depcomp lustre.spec.in tree_status > cmip-proc8:~ # > > The build with .configure and make rpms produced rpms that are installable: > > cmip-proc8:/etc # ls -ls /usr/src/packages/RPMS/x86_64/*1.8.5* > 4024 -rw-r--r-- 1 root root 4112883 2011-05-09 08:53 > /usr/src/packages/RPMS/x86_64/lustre-1.8.52.6.32.29_0.3_xen_201105090815.x86_64.rpm > 15532 -rw-r--r-- 1 root root 15881360 2011-05-09 08:54 > /usr/src/packages/RPMS/x86_64/lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm > 1332 -rw-r--r-- 1 root root 1358924 2011-05-09 08:54 > /usr/src/packages/RPMS/x86_64/lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm > 1416 -rw-r--r-- 1 root root 1441937 2011-05-09 08:53 > /usr/src/packages/RPMS/x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm > 3524 -rw-r--r-- 1 root root 3602163 2011-05-09 08:53 > /usr/src/packages/RPMS/x86_64/lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm > 2600 -rw-r--r-- 1 root root 2656393 2011-05-09 08:53 > /usr/src/packages/RPMS/x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm > > > cmip-proc8:/etc # rpm -e lustre-tests > cmip-proc8:/etc # rpm -e lustre > cmip-proc8:/etc # rpm -e lustre-modules > cmip-proc8:/etc # rpm -ivh > /usr/src/packages/RPMS/x86_64/lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm > Preparing... ########################################### [100%] > 1:lustre-modules ########################################### [100%] > Congratulations on finishing your Lustre installation! To register > your copy of Lustre and find out more about Lustre Support, Service, > and Training offerings please visit > > http://www.sun.com/software/products/lustre/lustre_reg.jsp > cmip-proc8:/etc # rpm -ivh > /usr/src/packages/RPMS/x86_64/lustre-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm > Preparing... ########################################### [100%] > 1:lustre ########################################### [100%] > cmip-proc8:/etc # rpm -ivh > /usr/src/packages/RPMS/x86_64/lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815.x86_64.rpm > Preparing... ########################################### [100%] > 1:lustre-tests ########################################### [100%] > cmip-proc8:/etc # > > ... > > cmip-proc8:/etc # rpm -qa | grep lustre > lustre-debuginfo-1.8.5-2.6.32.29_0.3_xen_201105090815 > lustre-modules-1.8.5-2.6.32.29_0.3_xen_201105090815 > lustre-1.8.5-2.6.32.29_0.3_xen_201105090815 > lustre-debugsource-1.8.5-2.6.32.29_0.3_xen_201105090815 > lustre-tests-1.8.5-2.6.32.29_0.3_xen_201105090815 > lustre-source-1.8.5-2.6.32.29_0.3_xen_201105090815 > > The problem reproduces: > > cmip-proc8:~ # cp /var/log/messages /tmp/m0 > cmip-proc8:~ # dmesg > /tmp/d0 > cmip-proc8:~ # lsmod | grep lustre > cmip-proc8:~ # modprobe lustre > Killed > cmip-proc8:~ # dmesg > /tmp/d1 > cmip-proc8:~ # cp /var/log/messages /tmp/m1 > cmip-proc8:~ # diff /tmp/d0 /tmp/d1 > 193a194,235 >> [ 84.786822] SFW2-INext-DROP-DEFLT IN=eth0 OUT= >> MAC=01:00:5e:00:00:01:00:30:1e:5d:54:80:08:00 SRC=130.246.188.226 >> DST=224.0.0.1 LEN=28 TOS=0x00 PREC=0x00 TTL=1 ID=34816 PROTO=2 >> [ 104.171306] BUG: unable to handle kernel NULL pointer dereference at >> 0000000000000008 >> [ 104.171317] IP: [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0 >> [ 104.171328] PGD 7d9d0067 PUD 7d94c067 PMD 0 >> [ 104.171333] Oops: 0000 [#1] SMP >> [ 104.171336] last sysfs file: /sys/module/ip_tables/initstate >> [ 104.171339] CPU 0 >> [ 104.171341] Modules linked in: lnet(N+) lvfs(N) libcfs(N) iptable_nat >> nf_nat xt_tcpudp xt_pkttype ipt_LOG xt_limit autofs4 binfmt_misc microcode >> xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter >> nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 >> ip_tables ip6_tables x_tables fuse loop dm_mod joydev rtc_core rtc_lib >> xennet ext3 mbcache jbd processor thermal_sys hwmon xenblk cdrom >> [ 104.171373] Supported: Yes >> [ 104.171376] Pid: 3441, comm: modprobe Tainted: G N >> 2.6.32.29-0.3-xen #1 >> [ 104.171379] RIP: e030:[<ffffffff8002c3d2>] [<ffffffff8002c3d2>] >> task_rq_lock+0x42/0xa0 >> [ 104.171384] RSP: e02b:ffff88007edade38 EFLAGS: 00010082 >> [ 104.171387] RAX: 0000000000000001 RBX: 0000000000009700 RCX: >> dead000000100100 >> [ 104.171390] RDX: 0000000000000000 RSI: ffff88007edade88 RDI: >> 0000000000000000 >> [ 104.171393] RBP: ffff88007edade58 R08: ffffffffa0252fb6 R09: >> 0000000000000000 >> [ 104.171396] R10: 0000000000000001 R11: ffffffff805f4200 R12: >> 0000000000009700 >> [ 104.171399] R13: 0000000000000000 R14: ffff88007edade88 R15: >> 000000000000000f >> [ 104.171406] FS: 00007f541715a700(0000) GS:ffff8800013c1000(0000) >> knlGS:0000000000000000 >> [ 104.171409] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b >> [ 104.171412] CR2: 0000000000000008 CR3: 000000007d905000 CR4: >> 0000000000002660 >> [ 104.171415] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >> 0000000000000000 >> [ 104.171418] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: >> 0000000000000400 >> [ 104.171421] Process modprobe (pid: 3441, threadinfo ffff88007edac000, >> task ffff88007df8a400) >> [ 104.171424] Stack: >> [ 104.171426] ffffffffa02579f8 0000000000000000 0000000000623da0 >> 0000000000623d30 >> [ 104.171430] <0> ffff88007edadeb8 ffffffff80038588 000000007fc11fa0 >> 00000000a02579f8 >> [ 104.171435] <0> 00000000a0243060 0000000000000000 0000000000000001 >> ffffffffa02579f8 >> [ 104.171441] Call Trace: >> [ 104.171449] [<ffffffff80038588>] try_to_wake_up+0x48/0x420 >> [ 104.171455] [<ffffffff8005b2e8>] up+0x48/0x50 >> [ 104.171464] [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet] >> [ 104.171478] [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet] >> [ 104.171489] [<ffffffff80004045>] do_one_initcall+0x35/0x1b0 >> [ 104.171495] [<ffffffff8006d154>] sys_init_module+0xe4/0x270 >> [ 104.171500] [<ffffffff80007458>] system_call_fastpath+0x16/0x1b >> [ 104.171506] [<00007f5416cf3f7a>] 0x7f5416cf3f7a >> [ 104.171508] Code: 1c 24 49 89 f6 4c 89 64 24 08 49 c7 c4 00 97 00 00 65 >> 8a 04 25 c1 67 00 00 65 c6 04 25 c1 67 00 00 01 0f b6 c0 4c 89 e3 49 89 06 >> <49> 8b 45 08 8b 40 18 48 03 1c c5 80 ae 62 80 48 89 df e8 f7 87 >> [ 104.171544] RIP [<ffffffff8002c3d2>] task_rq_lock+0x42/0xa0 >> [ 104.171548] RSP <ffff88007edade38> >> [ 104.171550] CR2: 0000000000000008 >> [ 104.171553] ---[ end trace 34c6e019e0aea7d2 ]--- >> [ 106.380129] SFW2-INext-DROP-DEFLT IN=eth0 OUT= >> MAC=01:00:5e:00:00:01:00:17:f2:0e:c4:a1:08:00 SRC=130.246.188.58 >> DST=224.0.0.1 LEN=44 TOS=0x00 PREC=0x00 TTL=1 ID=27534 PROTO=UDP SPT=54228 >> DPT=8612 LEN=24 > cmip-proc8:~ # > > > -----Original Message----- > From: Andreas Dilger [mailto:adil...@whamcloud.com] > Sent: 10 May 2011 21:48 > To: Chiu, Peter (STFC,RAL,RALSP) > Cc: lustre-discuss@lists.lustre.org > Subject: Re: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working > > On May 9, 2011, at 11:38, <peter.c...@stfc.ac.uk> <peter.c...@stfc.ac.uk> > wrote: >> The rpms lustre-modules, lustre and lustre-tests were then installed >> smoothly without any complaints. >> >> But the subsequent "modprobe lustre" will return a "Killed" message, with no >> lustre module loaded. >> >> dmesg also reveals "BUG: unable to handle kernel NULL pointer dereference >> at 0000000000000008" >> >> A second modprobe lustre command will then hang, again with no module loaded. >> Subsequently the client is not able to mount the lustre storage. >> >> Can anyone shed some light as to what has gone wrong here please? >> >> ./configure --with-linux=/usr/src/linux >> --with-linux-obj=/usr/src/linux-2.6.32.29-0.3-obj/x86_64/xen > > Are you sure that "/usr/src/linux" points to the same source as > "/usr/src/linux-2.6.32.29-0.3-obj"? Is that a symlink? Normally the source > and -obj files have a very similar pathname (i.e. just with "-obj" suffix > difference). > >>> [ 168.647996] BUG: unable to handle kernel NULL pointer dereference at >>> 0000000000000008 >>> [ 168.648066] Pid: 3445, comm: modprobe Tainted: G N >>> 2.6.32.29-0.3-xen #1 >> 0000000000000400 >>> [ 168.648110] Process modprobe (pid: 3445, threadinfo ffff88007efa4000, >>> task ffff88007e9100c0) >>> [ 168.648129] Call Trace: >>> [ 168.648138] [<ffffffff80038588>] try_to_wake_up+0x48/0x420 >>> [ 168.648143] [<ffffffff8005b2e8>] up+0x48/0x50 >>> [ 168.648153] [<ffffffffa0230d92>] LNetInit+0x92/0xc0 [lnet] >>> [ 168.648167] [<ffffffffa02430ac>] init_lnet+0x4c/0x280 [lnet] >>> [ 168.648178] [<ffffffff80004045>] do_one_initcall+0x35/0x1b0 >>> [ 168.648184] [<ffffffff8006d154>] sys_init_module+0xe4/0x270 >>> [ 168.648189] [<ffffffff80007458>] system_call_fastpath+0x16/0x1b >>> [ 168.648194] [<00007f3f40bc9f7a>] 0x7f3f40bc9f7a >> >> I have tried Lustre-1.8.4, but got the same result. >> I have also tried to follow the 1.8 Operations Manual to locate the >> diagnostic tools, but the link wiki.lustre.org is no longer valid. > > This looks like a pretty serious error to oops during module insertion, and > I'd suspect the build environment before any particular Lustre code. > > Cheers, Andreas > -- > Andreas Dilger > Principal Engineer > Whamcloud, Inc. > > > > -- > Scanned by iCritical. -- Scanned by iCritical. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss