On 2010-01-04, at 03:02, David Cohen wrote: > I'm using a mixed environment of 1.8.0.1 MDS and 1.6.6 OSS's (had a > problem > with qlogic drivers and rolled back to 1.6.6). > My MDS get unresponsive each day at 4-5 am local time, no kernel > panic or > error messages before.
Judging by the time, I'd guess this is "slocate" or "mlocate" running on all of your clients at the same time. This used to be a source of extremely high load back in the old days, but I thought that Lustre was in the exclude list in newer versions of *locate. Looking at the installed mlocate on my system, that doesn't seem to be the case... strange. > Some errors and an LBUG appear in the log after force booting the > MDS and > mounting the MDT and then the log is clear until next morning: > > Jan 4 06:33:31 tech-mds kernel: LustreError: 6357:0: > (class_hash.c:225:lustre_hash_findadd_unique_hnode()) > ASSERTION(hlist_unhashed(hnode)) failed > Jan 4 06:33:31 tech-mds kernel: LustreError: 6357:0: > (class_hash.c:225:lustre_hash_findadd_unique_hnode()) LBUG > Jan 4 06:33:31 tech-mds kernel: Lustre: 6357:0:(linux- > debug.c:222:libcfs_debug_dumpstack()) showing stack for process 6357 > Jan 4 06:33:31 tech-mds kernel: ll_mgs_02 R running task > 0 6357 > 1 6340 (L-TLB) > Jan 4 06:33:31 tech-mds kernel: Call Trace: > Jan 4 06:33:31 tech-mds kernel: thread_return+0x62/0xfe > Jan 4 06:33:31 tech-mds kernel: __wake_up_common+0x3e/0x68 > Jan 4 06:33:31 tech-mds kernel: :ptlrpc:ptlrpc_main+0x1218/0x13e0 > Jan 4 06:33:31 tech-mds kernel: default_wake_function+0x0/0xe > Jan 4 06:33:31 tech-mds kernel: audit_syscall_exit+0x31b/0x336 > Jan 4 06:33:31 tech-mds kernel: child_rip+0xa/0x11 > Jan 4 06:33:31 tech-mds kernel: :ptlrpc:ptlrpc_main+0x0/0x13e0 > Jan 4 06:33:31 tech-mds kernel: child_rip+0x0/0x11 It shouldn't LBUG during recovery, however. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss