Hi Ben,
Many thanks for the advice! I am working on scripts to do this. Thanks
also to Peter Braam for his help!
Instead of doing an ls -l I have mounted all the osts with ldiskfs and
built an index of the objects. I'm then looking for text type files with
the objectid obtained from the getstripe information. I look at the
content of the file to verify that it corresponds to the object id. From
this I can tie the index value on the MDS to the OST that holds that
object. At the end I will tunefs each OST with this index value.
I did find files earlier where the ls -l looks like it works but it is
pointing to the wrong file.
We have 42 OSTs so it is a bit tedious but not unmanageable!
Regards,
Rodger
On 26/09/2017 15:35, Ben Evans wrote:
I'm guessing on the osts, but what you'd want to do is to find files that
are striped to a single OST using "lfs getstripe". You'll need one file
per OST.
After that, you'll have to do something like iterate through the OSTs to
find the right combo where an ls -l works for that file. Keep track of
what OST indexes map to what devices, because you'll be destroying them
pretty constantly until you resolve all of them.
Each time you change an OST index, you'll need to do tunefs.lustre
--writeconf on *all* devices to make them register with the MGS again.
-Ben Evans
On 9/26/17, 1:08 AM, "lustre-discuss on behalf of rodger"
<[email protected] on behalf of
[email protected]> wrote:
Dear All,
Apologies for nagging on this!
Does anyone have any insight on assessing progress of the lfsck?
Does anyone have experience of fixing incorrect index values on OST?
Regards,
Rodger
On 25/09/2017 11:21, rodger wrote:
Dear All,
I'm still struggling with this. I am running an lfsck -A at present.
The
status update is reporting:
layout_mdts_init: 0
layout_mdts_scanning-phase1: 1
layout_mdts_scanning-phase2: 0
layout_mdts_completed: 0
layout_mdts_failed: 0
layout_mdts_stopped: 0
layout_mdts_paused: 0
layout_mdts_crashed: 0
layout_mdts_partial: 0
layout_mdts_co-failed: 0
layout_mdts_co-stopped: 0
layout_mdts_co-paused: 0
layout_mdts_unknown: 0
layout_osts_init: 0
layout_osts_scanning-phase1: 0
layout_osts_scanning-phase2: 12
layout_osts_completed: 0
layout_osts_failed: 30
layout_osts_stopped: 0
layout_osts_paused: 0
layout_osts_crashed: 0
layout_osts_partial: 0
layout_osts_co-failed: 0
layout_osts_co-stopped: 0
layout_osts_co-paused: 0
layout_osts_unknown: 0
layout_repaired: 82358851
namespace_mdts_init: 0
namespace_mdts_scanning-phase1: 1
namespace_mdts_scanning-phase2: 0
namespace_mdts_completed: 0
namespace_mdts_failed: 0
namespace_mdts_stopped: 0
namespace_mdts_paused: 0
namespace_mdts_crashed: 0
namespace_mdts_partial: 0
namespace_mdts_co-failed: 0
namespace_mdts_co-stopped: 0
namespace_mdts_co-paused: 0
namespace_mdts_unknown: 0
namespace_osts_init: 0
namespace_osts_scanning-phase1: 0
namespace_osts_scanning-phase2: 0
namespace_osts_completed: 0
namespace_osts_failed: 0
namespace_osts_stopped: 0
namespace_osts_paused: 0
namespace_osts_crashed: 0
namespace_osts_partial: 0
namespace_osts_co-failed: 0
namespace_osts_co-stopped: 0
namespace_osts_co-paused: 0
namespace_osts_unknown: 0
namespace_repaired: 68265278
with the layout_repaired and namespace_repaired values ticking up at
about 10000 per second.
Is the layout_osts_failed value of 30 a concern?
Is there any way to know how far along it is?
I am also seeing many messages similar to the following in
/var/log/messages on the mdt and oss with OST0000:
Sep 25 10:48:00 mds0l210 kernel: LustreError:
5934:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans())
terra-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
Sep 25 10:48:00 mds0l210 kernel: LustreError:
5934:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) Skipped
599
previous similar messages
Sep 25 10:48:30 mds0l210 kernel: LustreError:
6137:0:(fld_handler.c:256:fld_server_lookup()) srv-terra-MDT0000:
Cannot
find sequence 0x8: rc = -2
Sep 25 10:48:30 mds0l210 kernel: LustreError:
6137:0:(fld_handler.c:256:fld_server_lookup()) Skipped 16593 previous
similar messages
Sep 25 10:58:01 mds0l210 kernel: LustreError:
5934:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans())
terra-OST0000-osc-MDT0000: cannot cleanup orphans: rc = -22
Sep 25 10:58:01 mds0l210 kernel: LustreError:
5934:0:(osp_precreate.c:903:osp_precreate_cleanup_orphans()) Skipped
599
previous similar messages
Sep 25 10:58:57 mds0l210 kernel: LustreError:
6137:0:(fld_handler.c:256:fld_server_lookup()) srv-terra-MDT0000:
Cannot
find sequence 0x8: rc = -2
Sep 25 10:58:57 mds0l210 kernel: LustreError:
6137:0:(fld_handler.c:256:fld_server_lookup()) Skipped 40309 previous
similar messages
Do these indicate that the process is not working?
Regards,
Rodger
On 23/09/2017 15:07, rodger wrote:
Dear All,
In the process of upgrading 1.8.x to 2.x I've messed up a number of
the index values for OSTs by running tune2fs with the --index value
set. To compound matters while trying to get the OSTs to mount I
erased the last_rcvd files on the OSTs. I'm looking for a way to
confirm what the index should be for each device. Part of the reason
for my difficulty is that in the evolution of the filesystem some OSTs
were decommissioned and so the full set no longer has a sequential set
of index values. In practicing for the upgrade the trial sets that I
created did have nice neat sequential indexes and the process I
developed broke when I used the real data. :-(
The result is that although the lustre filesystem mounts and all
directories appear to be listed files in directories mostly have
question marks for attributes and are not available for access. I'm
assuming this is because the index for the OST holding the file is
wrong.
Any pointers to recovery would be much appreciated!
Regards,
Rodger
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org