On Fri, Sep 16, 2022 at 4:25 PM Christian Kuntz <[email protected]> wrote:
> Oof! That's not a good situation to be in. Unfortunately, I've hit the > dual import situation before as well, and as far as I know once you have > two nodes import a pool at the same time you're more or less hosed. > Many hours later, I'm now coming to that conclusion. When it happened to me, I tried using zdb to read all the recent TXGs to > try to back track the pool to a previously working state, but unfortunately > none of it worked, I think I tried 30 in all. You could try that route, > maybe you'll be luckier than I. > I have tried using zdb to find TXG to roll back to - on that stage now. Now might be the time to dust off any remote backups you have or reach out > to ZFS recovery specialists. Additionally, _always_ enable `zpool set > multihost=on <poolname>` for any pool that can be imported by more than one > node for this reason. You can ignore hostid checking safely with `zpool > import -f`, but without multihost set to on you have no protection against > simultaneous imports. > Sadly, there are no backups or snapshots - the system was intended as ephemeral /scratch storage, so we just don't have that. For rollback, look into the `-X` and `-T` pool import options. The man page > for `zdb` should be able to answer most of your questions. Otherwise, a > common actor in the ZFS recovery scene is https://www.ufsexplorer.com/ (or > at least as far as I've seen). > I've tried a few, however, this is the MDT for a lustre filesystem, so I can't really roll back very far without introducing corruption into the Lustre system...so...yeah. Thanks for responding. I'm talking to the ufs explorer people, it's worth a single system copy of their Pro product to see if it performs a miracle. Thanks! Scott
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
