Would using hard links work, instead of mv? Malcolm.
On 20/6/18, 1:34 am, "lustre-discuss on behalf of Robin Humble" <[email protected] on behalf of [email protected]> wrote: Hi, so we've maybe lost 1 OST out of a filesystem with 115 OSTs. we may still be able to get the OST back, but it's been a month now so there's pressure to get the cluster back and working and leave the files missing for now... the complication is that because the OST might come back to life we would like to avoid the users rm'ing their broken files and potentially deleting them forever. lustre is 2.5.41 ldiskfs centos6.x x86_64. ideally I think we'd move all the ~2M files on the OST to a root access only "shadow" directory tree in lustre that's populated purely with files from the dead OST. if we manage to revive the OST then these can magically come back to life and we can mv them back into their original locations. but currently mv: cannot stat 'some_file': Cannot send after transport endpoint shutdown the OST is deactivated on the client. the client hangs if the OST isn't deactivated. the OST is still UP & activated on the MDS. is there a way to mv files when their OST is unreachable? seems like mv is an MDT operation so it should be possible somehow? the only thing I've thought of seems pretty out there... mount the MDT as ldiskfs and mv the affected files into the shadow tree at the ldiskfs level. ie. with lustre running and mounted, create an empty shadow tree of all dirs under eg. /lustre/shadow/, and then at the ldiskfs level on the MDT: for f in <list_of_2m_files>; do mv /mnt/mdt0/ROOT/$f /mnt/mdt0/ROOT/shadow/$f done would that work? maybe we'd also have to rebuild OI's and lfsck - something along the lines of the MDT restore procedure in the manual. hopefully that would all work with an OST deactivated. alternatively, should we just unlink all the currently dead files from lustre now, and then if the OST comes back can we reconstruct the paths and filenames from the FID in xattrs's on the revived OST? I suspect unlink is final though and this wouldn't work... ? we can also take an lvm snapshot of the MDT and refer to that later I suppose, but I'm not sure how that might help us. as you can probably tell I haven't had to deal with this particular situation before :) thanks for any help. cheers, robin _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
