Hello, I’m puzzled about the difference between the two mmhealth events
longwaiters_found ERROR Detected Spectrum Scale long-waiters and deadlock_detected WARNING The cluster detected a Spectrum Scale filesystem deadlock Especially why the later has level WARNING only while the first has level ERROR? Longwaiters_found is based on the output of ‘mmdiag –deadlock’ and occurs much more often on our clusters, while the later probably is triggered by an external event and no internal mmsysmon check? Deadlock detection is handled by mmfsd? Whenever a deadlock is detected some debug data is collected, which is not true for longwaiters_detected. Hm, so why is no deadlock detected whenever mmdiag –deadlock shows waiting threads? Shouldn’t the severity be the opposite way? Finally: Can we trigger some debug data collection whenever a longwaiters_found event happens – just getting the output of ‘mmdiag –deadlock’ on the single node could give some hints. Without I don’t see any real chance to take any action. Thank you, Heiner -- ======================= Heinrich Billich ETH Zürich Informatikdienste Tel.: +41 44 632 72 56 [email protected] ========================
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
