Hi Kurt,
For reference when an underlying OST object is missing, this is the error message generated on our MDS (lustre 2.5):
Lustre: 12752:0:(mdd_object.c:1983:mdd_dir_page_build()) build page failed: -5!

I suspect until you update the MGS info the MDS will still connect to the deactive OST.

My experience is sometimes the recipe to deactivate an OST works flawlessly sometimes other times the clients hang on "df" command and timeout on file access. I guess the order which you run the commands (ie. client vs server) is important.

regards,
chris hunter

From: Kurt Strosahl <[email protected]>
To: [email protected]
Subject: [lustre-discuss] Inactivated ost still showing up on the mds

All,

   Continuing the issues that I reported yesterday...  I found that by 
unlinking lost files that I was able to stop the below error from occurring, 
this gives me hope that systems will stop crashing once all the lost files are 
scrubbed.

LustreError: 7676:0:(sec.c:379:import_sec_validate_get()) import 
ffff880623098800 (NEW) with no sec
LustreError: 7971:0:(sec.c:379:import_sec_validate_get()) import 
ffff880623098800 (NEW) with no sec

   I do note that the inactivated ost doesn't seem to ever REALLY go away.  
After I removed an ost from my test system I noticed that the mds still showed 
it...

On a client hooked up to the test system...
client: lfs df
UUID                   1K-blocks        Used   Available Use% Mounted on
testL-MDT0000_UUID    1819458432       10112  1819446272   0% /testlustre[MDT:0]
testL-OST0000_UUID   57914433152       12672 57914418432   0% /testlustre[OST:0]
testL-OST0001_UUID   57914433408       12672 57914418688   0% /testlustre[OST:1]
testL-OST0002_UUID   57914433408       12672 57914418688   0% /testlustre[OST:2]
OST0003             : inactive device
testL-OST0004_UUID   57914436992      144896 57914290048   0% /testlustre[OST:4

on the mds it still shows as up when I do lctl dl:
mds: lctl dl | grep OST0003
 22 UP osp testL-OST0003-osc-MDT0000 testL-MDT0000-mdtlov_UUID 5

So I stopped the test system, ran lctl dl again (getting no results), and 
restarted it.  Once the system was back up I still saw OST3 marked as UP with 
lctl dl:
mds: lctl dl | grep OST0003
 11 UP osp testL-OST0003-osc-MDT0000 testL-MDT0000-mdtlov_UUID 5

Why does the mds still think that this OST is up?


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to