We had a filesystem corruption back in February, and we've been trying to salvage things since then. I've spent the past month slowly draining the corrupt OST, and over the weekend it finally finished. An lfs find on the filesystem says that there are no files stored on that OST. The OST is 100% full, and if I mount it as an ldiskfs I can see a little over five millions files in O/*/*. Most of them have numbers as names, and some of them are named LAST_ID. All of the numbered files seem to be user data, with owners, and real data in them (based on ls and the find command)
I would like to clean out this OST and readd it to lustre, but I'm unsure of how to best approach this. I see several options: OPTION ONE: run lfsck against the entire filesystem with the full and previously corrupt OST mounted. OPTION TWO: run lfsck against only the corrupt OST in the hopes that cleans up all of the orphans on that OST. OPTION THREE: mounted as ldiskfs remove O/*/[1234567890]*[1234567890] and then remount the file system. OPTION FOUR: newfs the bad OST and readd it losing the old index. We tried option one once before, and it killed cluster jobs because it made files unreadable while they were in use. Option two might avoid that since it would not be affecting existing files. Option three sounds like it will work based on my limited knowledge of how lustre works, and would probably be the most expedient method. Option four is annoying because it leaves a hole in the lustre that is upsetting to our OCD tendencies. Any and all advice is appreciated here. Thank you. --Schlake Sysadmin IV, NRAO Work: 575-835-7281 (BACK IN THE OFFICE!) Cell: 575-517-5668 (out of work hours) _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org