Correct. AFAIK to permanently disable the OST, you have to update the
MGS info using command "lctl conf_param". Not sure if there is ability
to undo/backout this command if your bad ost recovers in future.
FYI, I believe you have run deactivate command on MDS and all active
clients.
Another clever approach is to use ost pools to mask out your bad ost.
regards,
chris hunter
On 01/21/2016 05:00 PM, Kurt Strosahl wrote:
Is that in lustre 2.5.3?
the lctl --device=xx deactivate is what sets an ost to read-only mode, it
doesn't permanently disable int in the system... or have I missed something?
w/r,
Kurt
----- Original Message -----
From: "Chris Hunter" <[email protected]>
To: "Kurt Strosahl" <[email protected]>
Cc: [email protected]
Sent: Thursday, January 21, 2016 4:13:35 PM
Subject: Re: [lustre-discuss] Inactivated ost still showing up on the mds
Hi Kurt,
AFAIK when you set active=0 on the MDS it means "don't write new files
to this OST but still read files". If what you really want is "don't try
to read files from this OST", then you have to flip the active=0 bit on
all your clients. I've never used the llite.lazystatfs option but it may
do the same thing.
FWIW, an alternative command to deactivate an OST is command "lctl
--device=XX deactivate".
regards,
chris hunter
On 01/21/2016 01:09 PM, Kurt Strosahl wrote:
Good Afternoon Chris,
I have already run the active=0 command on the mds, is there another step?
From my testing under 2.5.3 the clients will hang indefinitely without using
the lazystatfs=1.
Our major issue at present is that when the OST died it had a fair amount
of data on in (closing in on 2M files lost), and it seems like the client gets
into a bad state when calls re made repeatedly to files that are lost (but
still have their ost index information). As the crawl has unlinked files the
number of errors has dropped, as have client crashes.
w/r,
Kurt
----- Original Message -----
From: "Chris Hunter" <[email protected]>
To: [email protected]
Cc: "Kurt Strosahl" <[email protected]>
Sent: Thursday, January 21, 2016 12:50:03 PM
Subject: [lustre-discuss] Inactivated ost still showing up on the mds
Hi Kurt,
For reference when an underlying OST object is missing, this is the
error message generated on our MDS (lustre 2.5):
Lustre: 12752:0:(mdd_object.c:1983:mdd_dir_page_build()) build page failed: -5!
I suspect until you update the MGS info the MDS will still connect to
the deactive OST.
My experience is sometimes the recipe to deactivate an OST works
flawlessly sometimes other times the clients hang on "df" command and
timeout on file access. I guess the order which you run the commands
(ie. client vs server) is important.
regards,
chris hunter
From: Kurt Strosahl <[email protected]>
To: [email protected]
Subject: [lustre-discuss] Inactivated ost still showing up on the mds
All,
Continuing the issues that I reported yesterday... I found that by
unlinking lost files that I was able to stop the below error from occurring,
this gives me hope that systems will stop crashing once all the lost files are
scrubbed.
LustreError: 7676:0:(sec.c:379:import_sec_validate_get()) import
ffff880623098800 (NEW) with no sec
LustreError: 7971:0:(sec.c:379:import_sec_validate_get()) import
ffff880623098800 (NEW) with no sec
I do note that the inactivated ost doesn't seem to ever REALLY go away.
After I removed an ost from my test system I noticed that the mds still showed
it...
On a client hooked up to the test system...
client: lfs df
UUID 1K-blocks Used Available Use% Mounted on
testL-MDT0000_UUID 1819458432 10112 1819446272 0% /testlustre[MDT:0]
testL-OST0000_UUID 57914433152 12672 57914418432 0% /testlustre[OST:0]
testL-OST0001_UUID 57914433408 12672 57914418688 0% /testlustre[OST:1]
testL-OST0002_UUID 57914433408 12672 57914418688 0% /testlustre[OST:2]
OST0003 : inactive device
testL-OST0004_UUID 57914436992 144896 57914290048 0% /testlustre[OST:4
on the mds it still shows as up when I do lctl dl:
mds: lctl dl | grep OST0003
22 UP osp testL-OST0003-osc-MDT0000 testL-MDT0000-mdtlov_UUID 5
So I stopped the test system, ran lctl dl again (getting no results), and
restarted it. Once the system was back up I still saw OST3 marked as UP with
lctl dl:
mds: lctl dl | grep OST0003
11 UP osp testL-OST0003-osc-MDT0000 testL-MDT0000-mdtlov_UUID 5
Why does the mds still think that this OST is up?
--
regards,
chris hunter
[email protected]
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org