Hello List, we are trying to debug some issues - or possibly different manifestations of the same issue - on our file system.
Causing most grieve at the moment is that we sometimes see delays writing files. From the writing clients end, it simply looks as if I/O stops for a while (we've seen 'pauses' of anything up to 10 seconds). This appears to be independent of what client does the writing, and software doing the writing. We investigated this a bit using strace and dd; the 'slow' calls appear to always be either open, write, or close calls. Usually, these take well below 0.001s; in around 0.5% or 1% of cases, they take up to multiple seconds. It does not seem to be associated with any specific OST, OSS, client or anything; there is nothing in any log files or any exceptional load on MDS or OSS or any of the clients. The other issue is that we frequently see delays when trying to read a file. I sometimes takes more than 60s for a file to be visible on a machine after the initial write on a different machine has completed (both machines being Lustre clients). Again, there is nothing in the logs, nor exceptional load on any of the machines. Any ideas what this could be? How can we debug this? Clients and servers are using Lustre 1.6.7.2.ddn3.5. Regards, Tina -- Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd Diamond House, Harwell Science and Innovation Campus - 01235 77 8442 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
