Hello,

I’m puzzled  about the difference between the two mmhealth events

longwaiters_found ERROR Detected Spectrum Scale long-waiters

and

deadlock_detected         WARNING    The cluster detected a Spectrum Scale 
filesystem deadlock

Especially why the later has level WARNING only while the first has level 
ERROR? Longwaiters_found is based on the output of ‘mmdiag –deadlock’ and 
occurs much more often on our clusters, while the later probably is triggered 
by an external event and no internal mmsysmon check? Deadlock detection is 
handled by  mmfsd? Whenever  a deadlock is detected some debug data is 
collected, which is not true for longwaiters_detected. Hm, so why is no 
deadlock detected whenever mmdiag –deadlock shows waiting threads? Shouldn’t  
the severity be the opposite way?

Finally: Can we trigger some debug data collection whenever a longwaiters_found 
event happens – just getting the output of ‘mmdiag –deadlock’ on the single 
node could give some hints. Without I don’t see any real chance to take any 
action.

Thank you,

Heiner
--
=======================
Heinrich Billich
ETH Zürich
Informatikdienste
Tel.: +41 44 632 72 56
[email protected]
========================



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to