As is to be expected, MDT no. 2 did not like the situation either:

:~# cat /proc/fs/lustre/mdt/hebe-MDT0002/recovery_status
status: WAITING
non-ready MDTs:  0001
recovery_start: 1579525859
time_waited: 23


I was already reading LU-9748 and chewing my nails about an ad-hoc upgrade (this is a Lustre 2.10.6 system), when MDT 1 finally relented, obviously getting the necessary logs now that MDT 2 had been back and finished its recovery.
Then, of course, MDT 2 also recovered.


In such a situation, would 'lctl abort recovery' help?
Or shutting down all three servers and then restarting 0 - 1 - 2 ?

Regrads,
Thomas


On 20/01/2020 14.00, Thomas Roth wrote:
Hi all,

I had to restart our MDTs 1 and 2.
No.2 is still doing a file system check, no. 1 is mounted again and should be 
in recovery, however:

:~# cat recovery_status
status: WAITING
non-ready MDTs:  0002
recovery_start: 1579524336
time_waited: 538


Seem I have misunderstood the organisation of multiple MDTs: I thought they were independent of each other - execept that MDT 0 has the root of the filesystem, of course.

But the others, waiting for everybody to be online?


Regards,
Thomas




--
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to