Re: [lustre-discuss] ZFS MDT Corruption

Scott Ruffner via lustre-discuss Sun, 18 Sep 2022 15:39:54 -0700

On Fri, Sep 16, 2022 at 4:25 PM Christian Kuntz <[email protected]>
wrote:


> Oof! That's not a good situation to be in. Unfortunately, I've hit the
> dual import situation before as well, and as far as I know once you have
> two nodes import a pool at the same time you're more or less hosed.
>

Many hours later, I'm now coming to that conclusion.

When it happened to me, I tried using zdb to read all the recent TXGs to
> try to back track the pool to a previously working state, but unfortunately
> none of it worked, I think I tried 30 in all. You could try that route,
> maybe you'll be luckier than I.
>

I have tried using zdb to find TXG to roll back to - on that stage now.

Now might be the time to dust off any remote backups you have or reach out
> to ZFS recovery specialists. Additionally, _always_ enable `zpool set
> multihost=on <poolname>` for any pool that can be imported by more than one
> node for this reason. You can ignore hostid checking safely with `zpool
> import -f`, but without multihost set to on you have no protection against
> simultaneous imports.
>

Sadly, there are no backups or snapshots - the system was intended as
ephemeral /scratch storage, so we just don't have that.

For rollback, look into the `-X` and `-T` pool import options. The man page
> for `zdb` should be able to answer most of your questions. Otherwise, a
> common actor in the ZFS recovery scene is https://www.ufsexplorer.com/ (or
> at least as far as I've seen).
>

I've tried a few, however, this is the MDT for a lustre filesystem, so I
can't really roll back very far without introducing corruption into the
Lustre system...so...yeah.

Thanks for responding. I'm talking to the ufs explorer people, it's worth a
single system copy of their Pro product to see if it performs a miracle.

Thanks!

Scott

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] ZFS MDT Corruption

Reply via email to