now onto the next problem :)
setup is a single ~1TB software raid0 OST on 'xe', an MDT/MGS on 'x19'
with SATA disk, and 16 other client nodes mounting the Lustre filesystem
and doing bonnie++ over o2ib. all standard Lustre rpms, x86_64 CentOS4.4.
IIRC the variable number of threads is a new feature in 1.5.97?
looks like there's a small bug in it...
it ends up with 512 ll_ost_io_* threads, and the last is hung in a D state
with no i/o on the filesystem. ie.
% ps auxw
...
root 7488 0.0 0.0 0 0 ? S 19:20 0:00 [ll_ost_io_511]
root 7489 0.0 0.0 0 0 ? D 19:20 0:00 [ll_ost_io_512]
...
after the i/o:
num threads name
----------- ----
5 ll_log_comt_*
8 ldlm_bl_*
8 ldlm_cb_*
15 ldlm_cn_*
304 ll_ost_*
512 ll_ost_io_*
the /tmp/lustre-log.* files are kinda huge. they're at
http://www.cita.utoronto.ca/~rjh/lustre/
there's no LBUG if I restrict the number of threads with eg:
options ost oss_num_threads=300
let me know if you'd like more info or to try patches etc.
cheers,
robin
Feb 10 19:20:41 xe kernel: LustreError:
7489:0:(ost_handler.c:1555:ost_thread_init()) ASSERTION(thread->t_id <
OSS_THREADS_MAX) failed
Feb 10 19:20:41 xe kernel: LustreError:
7489:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG
Feb 10 19:20:41 xe kernel: Lustre:
7489:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process
7489
Feb 10 19:20:41 xe kernel: ll_ost_io_512 R running task 0 7489 1
7488 (L-TLB)
Feb 10 19:20:41 xe kernel: 000001013edf7ee8 000001011b21e070 000001018e064280
000001013edf7db8
Feb 10 19:20:41 xe kernel: 0000010037d01600 0000000000000200
0000000000000000 0000000000000000
Feb 10 19:20:41 xe kernel: 0000000000000459 00000000ffffffff
Feb 10 19:20:41 xe kernel: Call Trace:<ffffffffa03ca547>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110e1b>{child_rip+0}
Feb 10 19:20:41 xe kernel:
Feb 10 19:20:41 xe kernel: LustreError: dumping log to
/tmp/lustre-log.1171095641.7489
Feb 10 19:20:42 xe kernel: Lustre:
7489:0:(linux-debug.c:98:libcfs_run_upcall()) Invoked LNET upcall
/usr/lib/lustre/lnet_upcall
LBUG,/scratch/lbuild-boulder/lbuild-v1_5_97_3-2.6-rhel4-x86_64/lbuild/BUILD/lustre-1.5.97/lnet/libcfs/tracefile.c,libcfs_assertion_failed,433
Feb 10 19:22:21 xe kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog
triggered for pid 6797: it was inactive for 100000ms
Feb 10 19:22:21 xe kernel: Lustre:
0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process 6797
Feb 10 19:22:21 xe kernel: ll_ost_io_231 S 0000000000000202 0 6797 1
6798 6796 (L-TLB)
Feb 10 19:22:21 xe kernel: 000001013edf7d78 0000000000000046 000001018e064280
0000010100000074
Feb 10 19:22:21 xe kernel: 000001013edf7ee8 0000000080110dbe
0000010001071aa0 000000011b21e070
Feb 10 19:22:21 xe kernel: 000001013d4ab800 000000000000124e
Feb 10 19:22:21 xe kernel: Call
Trace:<ffffffffa03ca3c5>{:ptlrpc:ptlrpc_start_thread+1522}
Feb 10 19:22:21 xe kernel: <ffffffff801331a5>{default_wake_function+0}
<ffffffffa03c858d>{:ptlrpc:ptlrpc_server_free_request+34}
Feb 10 19:22:21 xe kernel: <ffffffff80115d7b>{do_gettimeofday+77}
<ffffffffa03cacec>{:ptlrpc:ptlrpc_main+1957}
Feb 10 19:22:21 xe kernel:
<ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0}
Feb 10 19:22:21 xe kernel:
<ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff80110e23>{child_rip+8}
Feb 10 19:22:21 xe kernel: <ffffffffa03ca547>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110e1b>{child_rip+0}
Feb 10 19:22:21 xe kernel:
Feb 10 19:22:21 xe kernel: LustreError: dumping log to
/tmp/lustre-log.1171095741.6797
Feb 10 19:29:26 x19 kernel: Lustre:
4761:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 30s
Feb 10 19:29:26 x19 kernel: Lustre:
4761:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 1 previous similar message
Feb 10 19:34:29 x19 kernel: Lustre:
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:29 x19 kernel: Lustre:
4777:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:29 x19 kernel: Lustre:
4783:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:30 x19 kernel: Lustre:
4783:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 5 previous similar
messages
Feb 10 19:34:30 x19 kernel: Lustre:
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 2 previous similar
messages
Feb 10 19:40:27 x19 kernel: Lustre:
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 38s
Feb 10 19:40:27 x19 kernel: Lustre:
4786:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 38s
Feb 10 19:40:27 x19 kernel: Lustre:
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 4 previous similar
messages
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss