On Wednesday 05 March 2008 01:06 am, Andreas Dilger wrote: > On Mar 04, 2008 19:52 +0100, Harald van Pee wrote: > > I have updated all clients to patched version 1.6.1, the servers still > > are 1.6.0.1. No lustre related error message occured since (2 weeks). > > > > I think its reasonable (necessary?) to e2fsck all osts and the mdt? > > The mdt resides on an drbd device configured as failover. > > > > I now have the following questions. > > 1. Is there a recommended order to do the file system checks? mdt first > > and than the osts or vice versa? > > > > 2. If I umount the mdt should I use -f ? I assume there will be no file > > system access possible as long the mdt is back again. Would it be better > > to umount all servers and clients and than the mdt? > > > > 3. I think each ost can be checked during the others are working, but I > > am unsure if I should use -f to umount or not? > > > > 4. should I unmount all clients? If this is recommended anyway, its > > maybe better to stop file system access for a couple of hours (2TB 70% > > used), but do the filesystem checks in parallel. > > If you are expecting to fix the filesystem, it is best to just unmount > everything and run e2fsck in parallel. Alternately, you can just force > unmount the MDT+OST filesystems and let the clients hang until the MDT+OSTs > are restarted, but this can be more troublesome in some cases.
o.k. thanks, than I will unmount all clients first and than unmount all osts and the mdt as the last. If it is possible should I try to avoid the -f flag? > > > On Monday 21 January 2008 11:55 pm, Andreas Dilger wrote: > > > On Jan 21, 2008 18:55 +0100, Harald van Pee wrote: > > > > The directory is just not there! Directory or file not found. > > > > > > > > in my opinion there is no error message on the clients which is > > > > directly related to the problem on our node0010 today I have seen > > > > this problem a several time. Mostly the directory is not seen! > > > > Probably all of the other directories can be accessed at the same > > > > time. > > > > > > > > and here all lustre related messages from the last days (others are > > > > mostly timestamps!) > > > > > > > > > > > > > > > > Jan 17 07:41:16 node0010 kernel: Lustre: 5723:0: > > > > (namei.c:235:ll_mdc_blocking_ast()) More than 1 alias dir 133798800 > > > > alias > > > > > > A quick search in bugzilla for this error message shows bug 12123, > > > which is fixed in the 1.6.1 release, and also has a patch. > > > > > > Cheers, Andreas > > > -- > > > Andreas Dilger > > > Sr. Staff Engineer, Lustre Group > > > Sun Microsystems of Canada, Inc. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. -- Harald van Pee Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
