now onto the next problem :)

setup is a single ~1TB software raid0 OST on 'xe', an MDT/MGS on 'x19'
with SATA disk, and 16 other client nodes mounting the Lustre filesystem
and doing bonnie++ over o2ib. all standard Lustre rpms, x86_64 CentOS4.4.

IIRC the variable number of threads is a new feature in 1.5.97?
looks like there's a small bug in it...

it ends up with 512 ll_ost_io_* threads, and the last is hung in a D state
with no i/o on the filesystem. ie.
  % ps auxw
  ...
  root      7488  0.0  0.0     0    0 ?        S    19:20   0:00 [ll_ost_io_511]
  root      7489  0.0  0.0     0    0 ?        D    19:20   0:00 [ll_ost_io_512]
  ...

after the i/o:
 num threads   name
 -----------   ----
      5      ll_log_comt_*
      8      ldlm_bl_*
      8      ldlm_cb_*
     15      ldlm_cn_*
    304      ll_ost_*
    512      ll_ost_io_*

the /tmp/lustre-log.* files are kinda huge. they're at
  http://www.cita.utoronto.ca/~rjh/lustre/

there's no LBUG if I restrict the number of threads with eg:
  options ost oss_num_threads=300

let me know if you'd like more info or to try patches etc.

cheers,
robin

Feb 10 19:20:41 xe kernel: LustreError: 
7489:0:(ost_handler.c:1555:ost_thread_init()) ASSERTION(thread->t_id < 
OSS_THREADS_MAX) failed
Feb 10 19:20:41 xe kernel: LustreError: 
7489:0:(tracefile.c:433:libcfs_assertion_failed()) LBUG
Feb 10 19:20:41 xe kernel: Lustre: 
7489:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process 
7489
Feb 10 19:20:41 xe kernel: ll_ost_io_512 R  running task       0  7489      1   
             7488 (L-TLB)
Feb 10 19:20:41 xe kernel: 000001013edf7ee8 000001011b21e070 000001018e064280 
000001013edf7db8
Feb 10 19:20:41 xe kernel:        0000010037d01600 0000000000000200 
0000000000000000 0000000000000000
Feb 10 19:20:41 xe kernel:        0000000000000459 00000000ffffffff
Feb 10 19:20:41 xe kernel: Call Trace:<ffffffffa03ca547>{:ptlrpc:ptlrpc_main+0} 
<ffffffff80110e1b>{child_rip+0}
Feb 10 19:20:41 xe kernel:
Feb 10 19:20:41 xe kernel: LustreError: dumping log to 
/tmp/lustre-log.1171095641.7489
Feb 10 19:20:42 xe kernel: Lustre: 
7489:0:(linux-debug.c:98:libcfs_run_upcall()) Invoked LNET upcall 
/usr/lib/lustre/lnet_upcall 
LBUG,/scratch/lbuild-boulder/lbuild-v1_5_97_3-2.6-rhel4-x86_64/lbuild/BUILD/lustre-1.5.97/lnet/libcfs/tracefile.c,libcfs_assertion_failed,433
Feb 10 19:22:21 xe kernel: Lustre: 0:0:(watchdog.c:130:lcw_cb()) Watchdog 
triggered for pid 6797: it was inactive for 100000ms
Feb 10 19:22:21 xe kernel: Lustre: 
0:0:(linux-debug.c:166:libcfs_debug_dumpstack()) showing stack for process 6797
Feb 10 19:22:21 xe kernel: ll_ost_io_231 S 0000000000000202     0  6797      1  
        6798  6796 (L-TLB)
Feb 10 19:22:21 xe kernel: 000001013edf7d78 0000000000000046 000001018e064280 
0000010100000074
Feb 10 19:22:21 xe kernel:        000001013edf7ee8 0000000080110dbe 
0000010001071aa0 000000011b21e070
Feb 10 19:22:21 xe kernel:        000001013d4ab800 000000000000124e
Feb 10 19:22:21 xe kernel: Call 
Trace:<ffffffffa03ca3c5>{:ptlrpc:ptlrpc_start_thread+1522}
Feb 10 19:22:21 xe kernel:        <ffffffff801331a5>{default_wake_function+0} 
<ffffffffa03c858d>{:ptlrpc:ptlrpc_server_free_request+34}
Feb 10 19:22:21 xe kernel:        <ffffffff80115d7b>{do_gettimeofday+77} 
<ffffffffa03cacec>{:ptlrpc:ptlrpc_main+1957}
Feb 10 19:22:21 xe kernel:        
<ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0} 
<ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0}
Feb 10 19:22:21 xe kernel:        
<ffffffffa03c9a47>{:ptlrpc:ptlrpc_retry_rqbds+0} <ffffffff80110e23>{child_rip+8}
Feb 10 19:22:21 xe kernel:        <ffffffffa03ca547>{:ptlrpc:ptlrpc_main+0} 
<ffffffff80110e1b>{child_rip+0}
Feb 10 19:22:21 xe kernel:
Feb 10 19:22:21 xe kernel: LustreError: dumping log to 
/tmp/lustre-log.1171095741.6797
Feb 10 19:29:26 x19 kernel: Lustre: 
4761:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 30s
Feb 10 19:29:26 x19 kernel: Lustre: 
4761:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 1 previous similar message
Feb 10 19:34:29 x19 kernel: Lustre: 
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:29 x19 kernel: Lustre: 
4777:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:29 x19 kernel: Lustre: 
4783:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 31s
Feb 10 19:34:30 x19 kernel: Lustre: 
4783:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 5 previous similar 
messages
Feb 10 19:34:30 x19 kernel: Lustre: 
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 2 previous similar 
messages
Feb 10 19:40:27 x19 kernel: Lustre: 
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 38s
Feb 10 19:40:27 x19 kernel: Lustre: 
4786:0:(lustre_fsfilt.h:284:fsfilt_setattr()) testfs-MDT0000: slow setattr 38s
Feb 10 19:40:27 x19 kernel: Lustre: 
4756:0:(lustre_fsfilt.h:284:fsfilt_setattr()) Skipped 4 previous similar 
messages

_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to