Assuming linux, there's some IO data in proc/<PID>/io that could be of use, though you'll need to do a bit of math, since it has only bytes, system-calls and characters. If you can pin down the offending process, even if it is the server-side process, that might be enough to follow it with strace and see which files are the bad ones
Aaron On 9/1/16 5:07 PM, Daniel Feenberg wrote: > > > On Thu, 1 Sep 2016, Rob Taylor wrote: > >> Have you tried iotop? >> It will tell you what processes are moving the most disk io at any >> given instant. >> Still might not get you what you want, but it might make it easier to >> narrow down. > > iotop moves the sequential access processes to the top of the list, > because a proces doing sequential access processes more kilobytes/second > than one doing random access (because of cache hits, among other > reasons). Our problem program is not "top" in iotop. > > Actually, knowing the file name would probably be just as good as > knowing the process, since we could find the owner of the file and > contact them. > > dan feenberg > >> >> rgt >> >> Whitehead Network/System Administrator >> >> ----- On Sep 1, 2016, at 3:05 PM, Daniel Feenberg [email protected] >> wrote: >> >>> Apparently heavy random I/O overloaded our fileserver last week, and >>> response was very slow. We solved the problem with additional spindles, >>> but we are curious to know which process is doing the random I/O. >>> Perhaps >>> we could approach that user with an offer to help improve their >>> turnaround >>> time by changing the code. Our users are mostly inexperienced >>> students so >>> the possibility of suboptimal code is certainly there. Most usage is >>> sequential access to very large files that does not load the fileserver >>> much at all so this has been a new experience for us. >>> >>> We can easily track bytes/second but a process doing random I/O may use >>> very few bytes/second, but still occupy much of the fileservers >>> capacity, >>> so it hasn't been fruitful to identify the processes doing the most >>> reads >>> and writes. During the period of overload, few disks were showing more >>> than kilobytes/second of read or write, yet iostat revealed that several >>> disks were continuously at 100%. >>> >>> A program such as iostat will tell us which physical disk is busy, lsof >>> will tell us which file is open by which process, netstat and nfstat >>> will >>> give aggregate statistics over all processes, but I can't find a program >>> that will tell us which process is occupying the fileservers attention >>> with expensive requests. >>> >>> We couldn't replace all the disks with SSD, but might be able to provide >>> SSD for some files, if we could identify the culprits. >>> >>> Daniel Feenberg >>> >>> _______________________________________________ >>> bblisa mailing list >>> [email protected] >>> http://www.bblisa.org/mailman/listinfo/bblisa >> > > _______________________________________________ > bblisa mailing list > [email protected] > http://www.bblisa.org/mailman/listinfo/bblisa -- _______________________________________________________ Aaron Macks([email protected]) [http://www.wiglaf.org/~aaronm ] My sheep has seven gall bladders, that makes me the King of the Universe! _______________________________________________ bblisa mailing list [email protected] http://www.bblisa.org/mailman/listinfo/bblisa
