Hi Kurt,

That’s not what I see at all, interesting, all my deactivated OST.  I'm running 
2.5.3, but with ldiskfs, a separate host for MDS vs MDT and I'm not using 
lasystafs.  I'm not saying those those things are or arn't related, but they 
are differences.

Its working for you now though? I'm certainly happy with my workaround for the 
purposes of decommissioning old working hardware and which are different to 
yours.  Its worth bearing yours in mind if we  lose an OST.  Are you cleaning 
up the links using a lustre client with the OST active or by deleting the files 
directly from the mds' zfs filesystem?

Thanks,
Sean

-----Original Message-----
From: Kurt Strosahl [mailto:[email protected]] 
Sent: 22 January 2016 13:56
To: Sean Brisbane
Cc: Chris Hunter; [email protected]
Subject: Re: Inactivated ost still showing up on the mds

A followup...

I just checked on one of the client nodes, and it shows active=0 even without 
having set it by hand.

lctl get_param osc.lustre-OST0004-osc*.active
osc.lustre-OST0004-osc-ffff880872638400.active=0

So I set it on the client...
lctl set_param osc.lustre2-OST0004-osc*.active=0
osc.lustre2-OST0004-osc-ffff880872638400.active=0

and it still shows as UP when I use lctl lctl dl | grep OST0004
  8 UP osc lustre2-OST0004-osc-

So why does lustre 2 require the extra legwork for removing an OST?  It might 
not be a common occurrence but it is something that will happen over the 
lifetime of a file system (due to system retirement or hardware failure).

w/r,
Kurt


----- Original Message -----
From: "Kurt Strosahl" <[email protected]>
To: "Sean Brisbane" <[email protected]>
Cc: "Chris Hunter" <[email protected]>, [email protected]
Sent: Friday, January 22, 2016 8:39:18 AM
Subject: Re: Inactivated ost still showing up on the mds

Good Morning,

   The real issue here is that the OST was decomissioned because the zpool on 
which it resided died, which left about 30TB of data (and possibly several 
million files) to be scrubbed.

   The steps I took were as follows... I set active=0 on the mds, and then set 
lazystatfs=1 on the mds and the clients so that df commands wouldn't hang.

   I don't see in the documentation where you have to set the ost to active=0 
on every client, did I miss that?  Also that is a marked change from 1.8, where 
deactivating an OST just required active=0 on the mds.

w/r,
Kurt

----- Original Message -----
From: "Sean Brisbane" <[email protected]>
To: "Kurt Strosahl" <[email protected]>, "Chris Hunter" <[email protected]>
Cc: [email protected]
Sent: Friday, January 22, 2016 4:33:41 AM
Subject: RE: Inactivated ost still showing up on the mds

Dear Kurt,

Im not sure if this is exactly what you were trying to do, but when I 
decommission an OST I also deactivate the OST on the client, which means that 
nothing on the OST will be accessible but the filesystem will carry on happily. 
 

 lctl set_param osc.lustresystem-OST00NN-osc*.active=0

Thanks,
Sean
________________________________________
From: lustre-discuss [[email protected]] on behalf of 
Kurt Strosahl [[email protected]]
Sent: 21 January 2016 18:09
To: Chris Hunter
Cc: [email protected]
Subject: Re: [lustre-discuss] Inactivated ost still showing up on the mds

Good Afternoon Chris,

   I have already run the active=0 command on the mds, is there another step?  
From my testing under 2.5.3 the clients will hang indefinitely without using 
the lazystatfs=1.

   Our major issue at present is that when the OST died it had a fair amount of 
data on in (closing in on 2M files lost), and it seems like the client gets 
into a bad state when calls re made repeatedly to files that are lost (but 
still have their ost index information).  As the crawl has unlinked files the 
number of errors has dropped, as have client crashes.

w/r,
Kurt

----- Original Message -----
From: "Chris Hunter" <[email protected]>
To: [email protected]
Cc: "Kurt Strosahl" <[email protected]>
Sent: Thursday, January 21, 2016 12:50:03 PM
Subject: [lustre-discuss] Inactivated ost still showing up on the mds

Hi Kurt,
For reference when an underlying OST object is missing, this is the error 
message generated on our MDS (lustre 2.5):
> Lustre: 12752:0:(mdd_object.c:1983:mdd_dir_page_build()) build page failed: 
> -5!

I suspect until you update the MGS info the MDS will still connect to the 
deactive OST.

My experience is sometimes the recipe to deactivate an OST works flawlessly 
sometimes other times the clients hang on "df" command and timeout on file 
access. I guess the order which you run the commands (ie. client vs server) is 
important.

regards,
chris hunter

> From: Kurt Strosahl <[email protected]>
> To: [email protected]
> Subject: [lustre-discuss] Inactivated ost still showing up on the mds
>
> All,
>
>    Continuing the issues that I reported yesterday...  I found that by 
> unlinking lost files that I was able to stop the below error from occurring, 
> this gives me hope that systems will stop crashing once all the lost files 
> are scrubbed.
>
> LustreError: 7676:0:(sec.c:379:import_sec_validate_get()) import 
> ffff880623098800 (NEW) with no sec
> LustreError: 7971:0:(sec.c:379:import_sec_validate_get()) import 
> ffff880623098800 (NEW) with no sec
>
>    I do note that the inactivated ost doesn't seem to ever REALLY go away.  
> After I removed an ost from my test system I noticed that the mds still 
> showed it...
>
> On a client hooked up to the test system...
> client: lfs df
> UUID                   1K-blocks        Used   Available Use% Mounted on
> testL-MDT0000_UUID    1819458432       10112  1819446272   0% 
> /testlustre[MDT:0]
> testL-OST0000_UUID   57914433152       12672 57914418432   0% 
> /testlustre[OST:0]
> testL-OST0001_UUID   57914433408       12672 57914418688   0% 
> /testlustre[OST:1]
> testL-OST0002_UUID   57914433408       12672 57914418688   0% 
> /testlustre[OST:2]
> OST0003             : inactive device
> testL-OST0004_UUID   57914436992      144896 57914290048   0% 
> /testlustre[OST:4
>
> on the mds it still shows as up when I do lctl dl:
> mds: lctl dl | grep OST0003
>  22 UP osp testL-OST0003-osc-MDT0000 testL-MDT0000-mdtlov_UUID 5
>
> So I stopped the test system, ran lctl dl again (getting no results), and 
> restarted it.  Once the system was back up I still saw OST3 marked as UP with 
> lctl dl:
> mds: lctl dl | grep OST0003
>  11 UP osp testL-OST0003-osc-MDT0000 testL-MDT0000-mdtlov_UUID 5
>
> Why does the mds still think that this OST is up?
>
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to