I've had the following BUG using lustre patchless client 1.6.3 on
linux 2.6.20 (Fedora Core 6).  This hard-locked the machine; I was
unable to tell if there was a subsequent panic.  This was while
copying approx 2.7 TB from ext3 to lustre; it had copied about 2.6 TB;
I haven't verified if the data made it across okay.  There were a ton
(> 1 M) of tiny files in this copy (which took about 60 hours over
gigE); these cause a tremendous performance hit.  This is not at all
surprising, I just wonder if it's related to the bug.

I'm trying to get access to the lustre OSTs and MDTs and will post
logs if they exist.

I'm fairly new to lustre; are issues like this common when using a
somewhat odd kernel?

Thanks,

-c

Linux rome.mayo.edu 2.6.20-1.2952.fc6 #1 SMP Wed May 16 18:18:22 EDT
2007 x86_64 x86_64 x86_64 GNU/Linux


Nov 16 05:28:34 rome kernel: BUG: soft lockup detected on CPU#0!
Nov 16 05:28:34 rome kernel:
Nov 16 05:28:34 rome kernel: Call Trace:
Nov 16 05:28:34 rome kernel:  <IRQ>  [<ffffffff802b0bdb>]
softlockup_tick+0xdb/0
xf6
Nov 16 05:28:34 rome kernel:  [<ffffffff8028f5d0>] update_process_times
+0x42/0x6
8
Nov 16 05:28:34 rome kernel:  [<ffffffff80271f0c>]
smp_local_timer_interrupt+0x3
4/0x55
Nov 16 05:28:34 rome kernel:  [<ffffffff802725e8>]
smp_apic_timer_interrupt+0x51
/0x69
Nov 16 05:28:34 rome kernel:  [<ffffffff8025ace6>] apic_timer_interrupt
+0x66/0x7
0
Nov 16 05:28:34 rome kernel:  <EOI>  [<ffffffff8022bde9>]
dummy_inode_permission
+0x0/0x3
Nov 16 05:28:34 rome kernel:  [<ffffffff8020933c>] __d_lookup+0xdd/
0x110
Nov 16 05:28:34 rome kernel:  [<ffffffff8020ca8f>] do_lookup+0x2a/
0x1ae
Nov 16 05:28:34 rome kernel:  [<ffffffff80209c72>] __link_path_walk
+0x903/0xdb0
Nov 16 05:28:34 rome kernel:  [<ffffffff8020e78d>] link_path_walk
+0x55/0xd7
Nov 16 05:28:34 rome kernel:  [<ffffffff8020c8f7>] do_path_lookup
+0x1b5/0x217
Nov 16 05:28:34 rome kernel:  [<ffffffff802123d6>] getname+0x152/0x1b8
Nov 16 05:28:34 rome kernel:  [<ffffffff802237fb>] __user_walk_fd
+0x37/0x4c
Nov 16 05:28:34 rome kernel:  [<ffffffff8023dc58>] vfs_lstat_fd
+0x18/0x47
Nov 16 05:28:34 rome kernel:  [<ffffffff8022a50f>] sys_newlstat
+0x19/0x31
Nov 16 05:28:34 rome kernel:  [<ffffffff8025a231>] tracesys+0x71/0xe1
Nov 16 05:28:34 rome kernel:  [<ffffffff8025a29c>] tracesys+0xdc/0xe1
Nov 16 05:28:34 rome kernel:
Nov 16 05:30:04 rome kernel: LustreError: 19267:0:(client.c:
969:ptlrpc_expire_on
e_request()) @@@ timeout (sent at 1195212504, 100s ago)
[EMAIL PROTECTED] x6
6226136/t0 o4->[EMAIL PROTECTED]@tcp:28 lens 384/352
ref 2 fl
Rpc:/0/0 rc 0/-22
Nov 16 05:30:04 rome kernel: Lustre: protfs-OST0003-osc-
ffff8100e50a5c00: Connec
tion to service protfs-OST0003 via nid [EMAIL PROTECTED] was lost;
in progress
 operations using this service will wait for recovery to complete.
Nov 16 05:30:09 rome kernel: LustreError: 19267:0:(client.c:
969:ptlrpc_expire_on
e_request()) @@@ timeout (sent at 1195212509, 100s ago)
[EMAIL PROTECTED] x6
6226138/t0 o4->[EMAIL PROTECTED]@tcp:28 lens 384/352
ref 3 fl
Rpc:/0/0 rc 0/-22


etc, etc.

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to