On Mar 04, 2008 19:52 +0100, Harald van Pee wrote: > I have updated all clients to patched version 1.6.1, the servers still are > 1.6.0.1. No lustre related error message occured since (2 weeks). > > I think its reasonable (necessary?) to e2fsck all osts and the mdt? > The mdt resides on an drbd device configured as failover. > > I now have the following questions. > 1. Is there a recommended order to do the file system checks? mdt first and > than the osts or vice versa? > > 2. If I umount the mdt should I use -f ? I assume there will be no file > system > access possible as long the mdt is back again. Would it be better to umount > all servers and clients and than the mdt? > > 3. I think each ost can be checked during the others are working, but I am > unsure if I should use -f to umount or not? > > 4. should I unmount all clients? If this is recommended anyway, its maybe > better to stop file system access for a couple of hours (2TB 70% used), but > do the filesystem checks in parallel.
If you are expecting to fix the filesystem, it is best to just unmount everything and run e2fsck in parallel. Alternately, you can just force unmount the MDT+OST filesystems and let the clients hang until the MDT+OSTs are restarted, but this can be more troublesome in some cases. > On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote: > > On Jan 21, 2008 18:55 +0100, Harald van Pee wrote: > > > The directory is just not there! Directory or file not found. > > > > > > in my opinion there is no error message on the clients which is directly > > > related to the problem on our node0010 today I have seen this problem a > > > several time. Mostly the directory is not seen! Probably all of the other > > > directories can be accessed at the same time. > > > > > > and here all lustre related messages from the last days (others are > > > mostly timestamps!) > > > > > > > > > > > > Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0: > > > (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800 alias > > > > A quick search in bugzilla for this error message shows bug 12123, > > which is fixed in the 1.6.1 release, and also has a patch. > > > > Cheers, Andreas > > -- > > Andreas Dilger > > Sr. Staff Engineer, Lustre Group > > Sun Microsystems of Canada, Inc. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
