Hi, I think you might hit this: http://jira.whamcloud.com/browse/LU-952 , you 
can find the patch from this ticket

Regards
Liang

On May 30, 2012, at 11:21 AM, huangql wrote:

> Dear  all,
> 
> Recently we found the problem in OSS that some threads might be hung when the 
> server got heavy IO load. In this case, some clients will be evicted or 
> refused by some OSTs and got the error messages as following:
> 
> Server side:
> 
> May 30 11:06:31 boss07 kernel: Lustre: Service thread pid 8011 was inactive 
> for 200.00s. The thread might be hung, or it might only be slow and will 
> resume later. D
> umping the stack trace for debugging purposes: May 30 11:06:31 boss07 kernel: 
> Lustre: Skipped 1 previous similar message
> May 30 11:06:31 boss07 kernel: Pid: 8011, comm: ll_ost_71 
> May 30 11:06:31 boss07 kernel: 
> May 30 11:06:31 boss07 kernel: Call Trace:
> May 30 11:06:31 boss07 kernel:  [<ffffffff886f5d0e>] 
> start_this_handle+0x301/0x3cb [jbd2]
> May 30 11:06:31 boss07 kernel:  [<ffffffff800a09ca>] 
> autoremove_wake_function+0x0/0x2e
> May 30 11:06:31 boss07 kernel:  [<ffffffff886f5e83>] 
> jbd2_journal_start+0xab/0xdf [jbd2]
> May 30 11:06:31 boss07 kernel:  [<ffffffff888ce9b2>] 
> fsfilt_ldiskfs_start+0x4c2/0x590 [fsfilt_ldiskfs]
> May 30 11:06:31 boss07 kernel:  [<ffffffff88920551>] 
> filter_version_get_check+0x91/0x2a0 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff80036cf4>] __lookup_hash+0x61/0x12f
> May 30 11:06:31 boss07 kernel:  [<ffffffff8893108d>] 
> filter_setattr_internal+0x90d/0x1de0 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff800e859b>] lookup_one_len+0x53/0x61
> May 30 11:06:31 boss07 kernel:  [<ffffffff88925452>] 
> filter_fid2dentry+0x512/0x740 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff88924e27>] 
> filter_fmd_get+0x2b7/0x320 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8003027b>] __up_write+0x27/0xf2
> May 30 11:06:31 boss07 kernel:  [<ffffffff88932721>] 
> filter_setattr+0x1c1/0x3b0 [obdfilter]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8882677a>] 
> lustre_pack_reply_flags+0x86a/0x950 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8881e658>] 
> ptlrpc_send_reply+0x5c8/0x5e0 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff88822b05>] 
> lustre_msg_get_version+0x35/0xf0 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff888b0abb>] ost_handle+0x25db/0x55b0 
> [ost]
> May 30 11:06:31 boss07 kernel:  [<ffffffff80150d56>] __next_cpu+0x19/0x28
> May 30 11:06:31 boss07 kernel:  [<ffffffff800767ae>] 
> smp_send_reschedule+0x4e/0x53
> May 30 11:06:31 boss07 kernel:  [<ffffffff8883215a>] 
> ptlrpc_server_handle_request+0x97a/0xdf0 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff888328a8>] 
> ptlrpc_wait_event+0x2d8/0x310 [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8008b3bd>] 
> __wake_up_common+0x3e/0x68
> May 30 11:06:31 boss07 kernel:  [<ffffffff88833817>] ptlrpc_main+0xf37/0x10f0 
> [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8005dfb1>] child_rip+0xa/0x11
> May 30 11:06:31 boss07 kernel:  [<ffffffff888328e0>] ptlrpc_main+0x0/0x10f0 
> [ptlrpc]
> May 30 11:06:31 boss07 kernel:  [<ffffffff8005dfa7>] child_rip+0x0/0x11
> May 30 11:06:31 boss07 kernel:
> May 30 11:06:31 boss07 kernel: LustreError: dumping log to 
> /tmp/lustre-log.1338347191.8011
> 
> 
> Client side:
> 
> May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred while 
> communicating with 192.168.50.123@tcp. The ost_connect operation failed with 
> -16
> 
> When you got this error message, you failed to run "ls", "df" ,"vi", "touch" 
> and so on, which affect us to do anything in the file system.
> I think the ost_connect failure could report some error messages to users 
> instead of  causing any interactive actions stuck.
> 
> Could someone give us some advice or any suggestions to solve this problem?
> 
> Thank you very much in advance.
> 
> 
> Best Regards
> Qiulan Huang
> 2012-05-30
> ====================================================================
> Computing center,the Institute of High Energy Physics, China
> Huang, Qiulan                        Tel: (+86) 10 8823 6010-105
> P.O. Box 918-7                       Fax: (+86) 10 8823 6839
> Beijing 100049  P.R. China           Email: [email protected]
> ===================================================================   
> 
> 
> 

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to