Re: [lustre-discuss] Inactivated ost still showing up on the mds

Sean Brisbane Tue, 02 Feb 2016 03:48:08 -0800

Dear All,

I am trying to do similar things to Kurt at the same time. I have attempted to 
decommission another OST since this thread started.


The symptom is that when I try to create a file this hangs indefinitely.


touch /lustre/atlas25/atlas/testfile

I have tried this with the OST mounted.
I have also tried this with the OST unmounted.

Does anyone have any  other pointers?

For the OSTs I want to decommission, none of these options work for me and the 
filesystem hangs indefinitely (in some cases I waited 20 mins).  The OST is 
healthy as far as I know, its just on old out of warranty hardware which is why 
I want to decommission it.  This process has previously worked for other OSTs 
in the file-system.  In this new case, the OST being decommissioned is the OST 
with the lowest index in the filesystem, could this be be the difference?



On clients (thanks to this thread for this)
lctl set_param llite.atlas25-ffff880205397c00.lazystatfs=1

on mds:

lctl set_param -P osc.atlas25-OST0033-osc-MDT0000.active=0

or on mgt (!=mds) and clients:

lctl set_param  osc.atlas25-OST0033-osc-MDT0000.active=0
lctl device 7 deactivate



Thanks,
Sean

>Unfortunately it was the pool under the OST that was corrupted, not the OST. I 
>couldn't import it >due to corruption on the pool. Kurt J. Strosahl System 
>Administrator Scientific Computing Group, >Thomas Jefferson National 
>Accelerator Fac
________________________________
From: lustre-discuss [[email protected]] on behalf of 
Kurt Strosahl [[email protected]]
Sent: 26 January 2016 18:31
To: Alexander I Kulyavtsev
Cc: <[email protected]>
Subject: Re: [lustre-discuss] Inactivated ost still showing up on the mds

Unfortunately it was the pool under the OST that was corrupted, not the OST. I 
couldn't import it due to corruption on the pool. Kurt J. Strosahl System 
Administrator Scientific Computing Group, Thomas Jefferson National Accelerator 
Facility ----- Original Message ----- From: Alexander I Kulyavtsev To: Kurt 
Strosahl Cc: Alexander I Kulyavtsev , Sent: Tue, 26 Jan 2016 13:23:20 -0500 
(EST) Subject: Re: [lustre-discuss] Inactivated ost still showing up on the mds 
Hi Kurt, probably too late if you unlinked the files: Did you do zfs snapshot 
on MDT and damaged OST before removing files? I so, it may be possible to mount 
ost zfs as a regualr zfs and pull out objects corresponding to files. mdt zfs 
snapshot to get fids. Alex. On Jan 22, 2016, at 7:39 AM, Kurt Strosahl wrote: > 
Good Morning, > > The real issue here is that the OST was decomissioned because 
the zpool on which it resided died, which left about 30TB of data (and possibly 
several million files) to be scrubbed. > > The steps I took were as follows... 
I set active=0 on the mds, and then set lazystatfs=1 on the mds and the clients 
so that df commands wouldn't hang. > > I don't see in the documentation where 
you have to set the ost to active=0 on every client, did I miss that? Also that 
is a marked change from 1.8, where deactivating an OST just required active=0 
on the mds. > > w/r, > Kurt > > ----- Original Message ----- > From: "Sean 
Brisbane" > To: "Kurt Strosahl" , "Chris Hunter" > Cc: 
[email protected] > Sent: Friday, January 22, 2016 4:33:41 AM > 
Subject: RE: Inactivated ost still showing up on the mds > > Dear Kurt, > > Im 
not sure if this is exactly what you were trying to do, but when I decommission 
an OST I also deactivate the OST on the client, which means that nothing on the 
OST will be accessible but the filesystem will carry on happily. > > lctl 
set_param osc.lustresystem-OST00NN-osc*.active=0 > > Thanks, > Sean > 
________________________________________ > From: lustre-discuss 
[[email protected]] on behalf of Kurt Strosahl 
[[email protected]] > Sent: 21 January 2016 18:09 > To: Chris Hunter > Cc: 
[email protected] > Subject: Re: [lustre-discuss] Inactivated ost 
still showing up on the mds > > Good Afternoon Chris, > > I have already run 
the active=0 command on the mds, is there another step? From my testing under 
2.5.3 the clients will hang indefinitely without using the lazystatfs=1. > > 
Our major issue at present is that when the OST died it had a fair amount of 
data on in (closing in on 2M files lost), and it seems like the client gets 
into a bad state when calls re made repeatedly to files that are lost (but 
still have their ost index information). As the crawl has unlinked files the 
number of errors has dropped, as have client crashes. > > w/r, > Kurt > > ----- 
Original Message ----- > From: "Chris Hunter" > To: 
[email protected] > Cc: "Kurt Strosahl" > Sent: Thursday, January 
21, 2016 12:50:03 PM > Subject: [lustre-discuss] Inactivated ost still showing 
up on the mds > > Hi Kurt, > For reference when an underlying OST object is 
missing, this is the > error message generated on our MDS (lustre 2.5): >> 
Lustre: 12752:0:(mdd_object.c:1983:mdd_dir_page_build()) build page failed: -5! 
> > I suspect until you update the MGS info the MDS will still connect to > the 
deactive OST. > > My experience is sometimes the recipe to deactivate an OST 
works > flawlessly sometimes other times the clients hang on "df" command and > 
timeout on file access. I guess the order which you run the commands > (ie. 
client vs server) is important. > > regards, > chris hunter > >> From: Kurt 
Strosahl >> To: [email protected] >> Subject: [lustre-discuss] 
Inactivated ost still showing up on the mds >> >> All, >> >> Continuing the 
issues that I reported yesterday... I found that by unlinking lost files that I 
was able to stop the below error from occurring, this gives me hope that 
systems will stop crashing once all the lost files are scrubbed. >> >> 
LustreError: 7676:0:(sec.c:379:import_sec_validate_get()) import 
ffff880623098800 (NEW) with no sec >> LustreError: 
7971:0:(sec.c:379:import_sec_validate_get()) import ffff880623098800 (NEW) with 
no sec >> >> I do note that the inactivated ost doesn't seem to ever REALLY go 
away. After I removed an ost from my test system I noticed that the mds still 
showed it... >> >> On a client hooked up to the test system... >> client: lfs 
df >> UUID 1K-blocks Used Available Use% Mounted on >> testL-MDT0000_UUID 
1819458432 10112 1819446272 0% /testlustre[MDT:0] >> testL-OST0000_UUID 
57914433152 12672 57914418432 0% /testlustre[OST:0] >> testL-OST0001_UUID 
57914433408 12672 57914418688 0% /testlustre[OST:1] >> testL-OST0002_UUID 
57914433408 12672 57914418688 0% /testlustre[OST:2] >> OST0003 : inactive 
device >> testL-OST0004_UUID 57914436992 144896 57914290048 0% 
/testlustre[OST:4 >> >> on the mds it still shows as up when I do lctl dl: >> 
mds: lctl dl | grep OST0003 >> 22 UP osp testL-OST0003-osc-MDT0000 
testL-MDT0000-mdtlov_UUID 5 >> >> So I stopped the test system, ran lctl dl 
again (getting no results), and restarted it. Once the system was back up I 
still saw OST3 marked as UP with lctl dl: >> mds: lctl dl | grep OST0003 >> 11 
UP osp testL-OST0003-osc-MDT0000 testL-MDT0000-mdtlov_UUID 5 >> >> Why does the 
mds still think that this OST is up? >> > 
_______________________________________________ > lustre-discuss mailing list > 
[email protected] > 
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > 
_______________________________________________ > lustre-discuss mailing list > 
[email protected] > 
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Inactivated ost still showing up on the mds

Reply via email to