Robin Humble wrote:
now onto the next problem :)
setup is a single ~1TB software raid0 OST on 'xe', an MDT/MGS on 'x19'
with SATA disk, and 16 other client nodes mounting the Lustre filesystem
and doing bonnie++ over o2ib. all standard Lustre rpms, x86_64 CentOS4.4.
IIRC the variable number of threads is a new feature in 1.5.97?
looks like there's a small bug in it...
Yes. Thanks, I'll take a look.
it ends up with 512 ll_ost_io_* threads, and the last is hung in a D state
with no i/o on the filesystem. ie.
% ps auxw
...
root 7488 0.0 0.0 0 0 ? S 19:20 0:00 [ll_ost_io_511]
root 7489 0.0 0.0 0 0 ? D 19:20 0:00 [ll_ost_io_512]
...
after the i/o:
num threads name
----------- ----
5 ll_log_comt_*
8 ldlm_bl_*
8 ldlm_cb_*
15 ldlm_cn_*
304 ll_ost_*
512 ll_ost_io_*
the /tmp/lustre-log.* files are kinda huge. they're at
http://www.cita.utoronto.ca/~rjh/lustre/
there's no LBUG if I restrict the number of threads with eg:
options ost oss_num_threads=300
This effectively disables the dynamic thread creation, so it may not be
just the number of threads that matters.
let me know if you'd like more info or to try patches etc.
cheers,
robin
Feb 10 19:20:41 xe kernel: LustreError: 7489:0:(ost_handler.c:1555:ost_thread_init())
ASSERTION(thread->t_id < OSS_THREADS_MAX) failed
Feb 10 19:20:41 xe kernel: LustreError:
7489:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG
Feb 10 19:20:41 xe kernel: Lustre:
7489:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process
7489
Feb 10 19:20:41 xe kernel: ll_ost_io_512 R running task 0 7489 1
7488 (L-TLB)
Feb 10 19:20:41 xe kernel: 000001013edf7ee8 000001011b21e070 000001018e064280
000001013edf7db8
Feb 10 19:20:41 xe kernel: 0000010037d01600 0000000000000200
0000000000000000 0000000000000000
Feb 10 19:20:41 xe kernel: 0000000000000459 00000000ffffffff
Feb 10 19:20:41 xe kernel: Call Trace:<ffffffffa03ca547>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110e1b>{child_rip+0}
Feb 10 19:20:41 xe kernel:
Feb 10 19:20:41 xe kernel: LustreError: dumping log to
/tmp/lustre-log.1171095641.7489
Feb 10 19:20:42 xe kernel: Lustre:
7489:0:(linux-debug.c:98:libcfs_run_upcall()) Invoked LNET upcall
/usr/lib/lustre/lnet_upcall
LBUG,/scratch/lbuild-boulder/lbuild-v1_5_97_3-2.6-rhel4-x86_64/lbuild/BUILD/lustre-1.5.97/lnet/libcfs/tracefile.c,libcfs_assertion_failed,433
Feb 10 19:22:21 xe kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog
triggered for pid 6797: it was inactive for 100000ms
Feb 10 19:22:21 xe kernel: Lustre:
0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process 6797
Feb 10 19:22:21 xe kernel: ll_ost_io_231 S 0000000000000202 0 6797 1
6798 6796 (L-TLB)
Feb 10 19:22:21 xe kernel: 000001013edf7d78 0000000000000046 000001018e064280
0000010100000074
Feb 10 19:22:21 xe kernel: 000001013edf7ee8 0000000080110dbe
0000010001071aa0 000000011b21e070
Feb 10 19:22:21 xe kernel: 000001013d4ab800 000000000000124e
Feb 10 19:22:21 xe kernel: Call
Trace:<ffffffffa03ca3c5>{:ptlrpc:ptlrpc_start_thread+1522}
Feb 10 19:22:21 xe kernel: <ffffffff801331a5>{default_wake_function+0}
<ffffffffa03c858d>{:ptlrpc:ptlrpc_server_free_request+34}
Feb 10 19:22:21 xe kernel: <ffffffff80115d7b>{do_gettimeofday+77}
<ffffffffa03cacec>{:ptlrpc:ptlrpc_main+1957}
Feb 10 19:22:21 xe kernel: <ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0}
Feb 10 19:22:21 xe kernel: <ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0}
<ffffffff80110e23>{child_rip+8}
Feb 10 19:22:21 xe kernel: <ffffffffa03ca547>{:ptlrpc:ptlrpc_main+0}
<ffffffff80110e1b>{child_rip+0}
Feb 10 19:22:21 xe kernel:
Feb 10 19:22:21 xe kernel: LustreError: dumping log to
/tmp/lustre-log.1171095741.6797
Feb 10 19:29:26 x19 kernel: Lustre:
4761:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 30s
Feb 10 19:29:26 x19 kernel: Lustre:
4761:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 1 previous similar message
Feb 10 19:34:29 x19 kernel: Lustre:
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:29 x19 kernel: Lustre:
4777:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:29 x19 kernel: Lustre:
4783:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:30 x19 kernel: Lustre:
4783:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 5 previous similar
messages
Feb 10 19:34:30 x19 kernel: Lustre:
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 2 previous similar
messages
Feb 10 19:40:27 x19 kernel: Lustre:
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 38s
Feb 10 19:40:27 x19 kernel: Lustre:
4786:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 38s
Feb 10 19:40:27 x19 kernel: Lustre:
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 4 previous similar
messages
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss