Hi all,

this morning one of our MDT went 'unhealthy',

> Jul 26 10:15:13 lxmds20 kernel: LustreError: 
> 9510:0:(service.c:3285:ptlrpc_svcpt_health_check())
mdt: unhealthy - request has been waiting 1017s
...

However, somewhat later,

> lxmds20:~# cat /sys/fs/lustre/health_check
healthy

and all Lustre operations seem to be good, too.


Used to be that if an MDT went unhealthy, all of Lustre was in kind of an 
'undefined' state, you had
to reboot and fs-check.
This is now Lustre 2.10.6 - can it heal itself reliably, or should we still 
take some action?

Regards
Thomas



-- 
--------------------------------------------------------------------
Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Ursula Weyrich, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Georg Schütte
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to