Nathan, Ive created a Jira issue for this, LU-13285<https://jira.whamcloud.com/browse/LU-13285>. In it I attached the output of an strace where I was able to capture a string of both successful and failed df's. ________________________________ From: Nathan Dauchy - NOAA Affiliate <[email protected]> Sent: Thursday, February 20, 2020 2:35 PM To: Konzem, Kevin P <[email protected]> Cc: [email protected] <[email protected]> Subject: [EXTERNAL] Re: [lustre-discuss] DF bug with lustre 2.12.4
On Thu, Feb 20, 2020 at 11:47 AM Konzem, Kevin P <[email protected]<mailto:[email protected]>> wrote: test this by running 'while [ true ];do /bin/df -TP /performance;done' on two sessions on the same client. As soon as I start the second while loop, the outputs go from: Filesystem Type 1024-blocks Used Available Capacity Mounted on 192.168.0.181@tcp:/perform lustre 71467728 100416 67664944 1% /performance to: Filesystem Type 1024-blocks Used Available Capacity Mounted on 192.168.0.181@tcp:/perform lustre 0 -0 -0 50% /performance Kevin, I can confirm seeing this issue intermittently as well, and usually with a re-run of df the results are once again reasonable. It looks like you have a more reliable reproducer though, which is good! A support ticket was opened with our vendor, and they said if we can capture a "strace" of it for a bad run that might be helpful... but I haven't caught it in the act yet. With your reproducer, can you get that and open a Jira ticket to track the problem? As a workaround, try "lfs df" instead, it may take a different code path that avoids the bug. -Nathan
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
