Hi Oleg, thanks for your reply. I'm not able to reproduce this error at will, though. There are files reported missing by our users, but I couldn't correlate these with the ll_inode_revalidate_fini errors, at least not directly. In fact, some of the missing files reappeared later, as reported in bug 16377, while others are gone for good. In comment #29 of bug 16377, Brian Murell stated that this can be caused by on-disk corruption. A file system check on the MDT claimed to correct a large number of problems when we had the last down time a month ago. (The said disappearance of files wasn't correlated with this fsck ;-)). So I'm still not reassured concerning the health of this MDT. We are running Lustre v 1.6.7.2 on the servers, the clients mainly still on 1.6.5.1.
Regards, Thomas Oleg Drokin wrote: > Hello! > > On Aug 6, 2009, at 12:57 PM, Thomas Roth wrote: > >> Hi, >> these ll_inode_revalidate_fini errors are unfortunately quite known to >> us. >> So what would you guess if that happens again and again, on a number of >> clients - MDT softly dying away? > > No, I do not think this is MDT problem of any sort at present, more > like some strange client interaction. > Are there any negative side effects in your case aside from log clutter? > Jobs failing or anything like that? > >> Because we haven't seen any mass evictions (and no reasons for that) in >> connection with these errors. >> Or could the problem with the cached open files also be present if the >> communication interruption does not show up as an eviction in the logs? > > It has nothing to do with opened files if there are no evictions. > I checked in bugzilla and found bug 16377 which looks like this report > too. Though the logs in there are somewhat confusing. > It almost appears as if the failing dentry is reported as a mountpoint > by vfs, but then it is not, since following inode_revalidate call > ends up on lustre again. > Do you have "lookup on mtpt" sort of errors coming from namei.c? > If you can reproduce the problem with ls or another tool at will, > can you please execute this on a client (comment #17 in the bug 16377): > # script > Script started, file is typescript > # lctl clear > # echo -1 > /proc/sys/lnet/debug > [ reproduce problem ] > # lctl dk > /tmp/ls.debug > # exit > Script done, file is typescript > > and attach your resulting ls.debug in the bug? > > Also what lustre version are you using? > > Bye, > Oleg _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
