On 2010-12-29, at 20:22, "Mervini, Joseph A" <[email protected]> wrote: > > And examining the LUN with tunefs.lustre produces the following: > > [r...@rio37 ~]# tunefs.lustre /dev/sdf > checking for existing Lustre data: found last_rcvd > tunefs.lustre: Unable to read 1.6 config /tmp/dirUvdBcz/mountdata.
That means the mountdata file is likely either missing or corrupted somehow. > Read previous values: > Target: > Index: 54 > UUID: ostr)o37sdf_UID > Lustre FS: lustre > Mount type: ldiskfs > Flags: 0x202 > (OST upgrade1.4 ) > Persistent mount opts: > Parameters: > > I suspected that there were file system inconsistencies so I ran fsck on one > of the target and got a large number of errors, primarily "Multiply-claimed > blocks" running e2fsck -fp and when it completed the OS told me I needed to > run fsck manually which I did with the "-fy" options. This dumped a ton of > inodes to lost+found. In addition, when it started it converted the file > system from ext3 to ext2 during the fsck and then recreated the journal when > it completed. There was some sort of device-level corruption in this case. The e2fsck fixed it as much as possible, and you should run ll_recover_lost_found_objs on the mounted filesystem. > However, I was still unable to mount the LUN and tunefs.lustre still had the > FATAL condition shown above. > > I AM able to mount all of the LUNs as ldiskfs devices so I suspect that the > lustre config for those OSTs just got clobbered somehow. Also, looking at the > inodes that were dumped to lost+found, most of them have timestamps that are > more that a year old that by policy should have been purged so I'm wondering > if it is just an artifact of the file system not being checked for a very > long time. That depends in atime, which is normally only updated on the MDS on disk. > Other things to note is the OSS is fiber channel attached to a DDN 9500 and > the OSTs that are having problems are associated with one controller of the > couplet. That is suspicious, but because neither controller is showing any > faults I suspect that whatever has occurred did not happen recently. It does seem to be the smoking gun. > In addition, the /CONFIG/mountdata on all the targets originally had a > timestamp of Aug 3 14:05 (and still does for the targets that can't be > mounted). > > So I have two questions: > > How can I restore the config data on the OSTs that are having problems? I think there was a thread on rebuilding the mountdata file recently. > What does "Multiply-claimed blocks" mean and does it indicate corruption? Disk-level corruption. > I am afraid that running e2fsck may have compounded my problems and am > holding off on doing any file system checks on the other 2 target. Well, it is needed at some point... _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
