On 02/10/2011 09:16 PM, Satoshi Isono wrote: > Dear members, > > I am looking into the way which can detect userid or jobid on the Lustre > client. Assumed the following condition; > > 1) Any users run any jobs through scheduler like PBS Pro, LSF or SGE. > 2) A users processes occupy Lustre I/O. > 3) Some Lustre servers (MDS?/OSS?) can detect high I/O stress on each server. > 4) But Lustre server cannot make the mapping between jobid/userid and Lustre > I/O processes having heavy stress, because there aren't userid on Lustre > servers. > 5) I expect that Lustre can monitor and can make the mapping. > 6) If possible for (5), we can make a script which launches scheduler > command like as qdel. > 7) Heavy users job will be killed by job scheduler. > > I want (5) for Lustre capability, but I guess current Lustre 1.8 cannot > perform (5). On the other hand, in order to map Lustre process to > userid/jobid, are there any ways using like rpctrace or nid stats? Can you > please your advice or comments?
I've written a utility called lltop which gathers I/O statistics from Lustre servers, along with job assignment data from cluster batch schedulers, to give a job-by-job accounting of filesystem load. Here's its output with names changed to protect the innocent: $ sudo tacc_lltop work JOBID WR_MB RD_MB REQS OWNER WORKDIR 1823815 2101 0 4176 al /work/000/al/job1 1823060 774 0 1570 bob /work/000/bob/fftw 1823634 323 3 3244 chas /work/000/chas/boltzeq 1823768 289 0 5108 deb /work/000/deb/mesh-08 1823085 55 0 110 ed /work/000/ed/jumble login3 18 3 2961 We use it on several systems, only with SGE so far, but it's hookable to other schedulers. See https://github.com/jhammond/lltop for source and documentation. Best, John -- John L. Hammond, Ph.D. TACC, The University of Texas at Austin [email protected] _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
