Are you using any Lustre monitoring tools? We use ltop from the LMT package (https://github.com/LLNL/lmt) and during that time of high load you could see if there are bursts of IOPs coming in. Running iotop or iostat might also provide some insight into the load if based on I/O.

Cameron

On 5/28/20 8:37 AM, Peeples, Heath wrote:

I have 2 MDSs and periodically on one of them (either at one time or another) peak above 300, causing the file system to basically stop.  This lasts for a few minutes and then goes away.  We can’t identify any one user running jobs at the times we see this, so it’s hard to pinpoint this on a user doing something to cause it.   Could anyone point me in the direction of how to begin debugging this?  Any help is greatly appreciated.

 

Heath


_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to