perhaps you haven't drunk the collectl kool-aid yet. 9-) with collectl you can dynamically monitor lots of system resources in real-time including lustre. here's an example on a system using GigE as the interconnect but I could just have easily choses to show IB, memory or a variety of other types of data. I could log it to a file and play it back later or even convert it to a form suitable for plotting with gnuplot. This is what I can see on an MDS but there's a whole lot more on the OSS or Client. See http://collectl.sourceforge.net/Tutorial-Lustre.html or just go to collectl's home page at http://collectl.sourceforge.net/
$ collectl -s cnl -oT # <-------CPU--------><-----------Network----------><--------Lustre MDS--------> #Time cpu sys inter ctxsw netKBi pkt-in netKBo pkt-out mdsCls Getatt Reint sync 16:50:15 0 0 1781 555 48 253 53 175 11 17 11 0 16:50:16 1 1 1679 518 46 240 48 158 9 13 9 0 16:50:17 0 0 1615 376 39 192 43 130 5 13 5 0 16:50:18 2 1 1933 693 37 212 41 149 12 24 12 0 16:50:19 1 0 1870 598 59 297 63 210 13 17 13 0 16:50:20 0 0 1835 555 45 225 45 155 11 17 11 0 my point is I want to see what lustre is doing (or at least thinks it's doing). Is there a way to tell dymamically how many statfs calls it's making? I guess I would have thought that the MDS would track something like this. I guess my point is if you're tracking something like stat calls and you occasionally see someone hammering on the MDS, by having timestamps you can then go back through the historical list of what processes where running and when (yes, collectl monitors them too) and find who was running at the time. When you have a set of data correlated across both the filesystem as well as all the clients you have an incredibly powerful diagnostic capability. -mark Brian J. Murrell wrote: > On Thu, 2008-03-27 at 14:23 -0400, Mark Seger wrote: > >> In >> particular I'm looking at a non-zero value for statfs but no matter what >> I do it doesn't change! >> > > Not surprising. "man statfs". Not something I'd expect a lot of calls > to from common userspace applications. (Untested but) Trying using > "stat(1)" with the "-f" argument on your lustre mount point and see if > that bumps up the counter. > > b. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
