Wojciech Turek <wj...@...<mailto:wj...@...>> writes:



>

>

> Thank you all for very useful suggestions. The Andreas's way which

> uses

rpc_history gave out exactly what I was looking for in a quite easy to read 
form.

> On 9 July 2010 18:26, Andreas Dilger <andreas.dilger-

[email protected]<mailto:[email protected]>>
 wrote:

> On 2010-07-08, at 16:11, Bernd Schubert wrote:

> >> Bernd, would you (or anyone) be interested to enhance those tools

> >> to be

able to show stats data from multiple files at once (each prefixed by the 
device name and/or client NID)? Â I don't think it makes sense to create 
separate tools for this.

>



For what it's worth, you can get very detailed client-side stats from collectl.

The way it figures out what the client is doing is to actually look at the ost- 
level stats and add them up!  Why?  because that means you can they replay the 
data and break things down by OST.



There are also client side switches to look at BRW stats, readahead stats and 
even what's going on with meta-data.  If you then plot the data with colplot 
you can drill down and look at all kinds of things.  For example if  you have 
data from multiple clients you can even compare it side-by-side.  check out 
collectl-utils on sourceforge if you haven't yet.



Alas, I'm one of the few people (I think) who ever gets into this level of 
analysis because I fear the number of switches tend to scare people off.  ;)



-mark



> >

> > I'm not sure if the existing lustre tools are really what we need.

> > If you

have a cluster with 200 or more clients and then want to figure out which 
clients are doing most IO, several lines per client provide too much output.

> I agree, but having a 200-column line is also not very useful. Â I

> like the

"llobdstat" output where it prints the IO numbers, and then appends only the 
abbreviated values that are changing for that interval, instead of printing all 
of the values.

>

> > One line sorted by IO seems to be better, IMHO.

> The commands that I posted using the rpc_history file will print out a

> summary

of all client RPC counts sorted by maximum user. Â Something similar could be 
done by aggregating all of the per-client stats as well, though it would mean 
touching a lot more input files for each interval.

>

> > I would be for interested to enhance the existing tools, but then if

> > I look

into the number of open bugs I have, several of those have a higher priorty 
(btw, this script is among my bug list (bug 22469)).

> I was actually hoping that someone else might take it up. Â The llstat

> and

llobdstat scripts are perl, and there should be a good number of people who 
could do a bit of perl hacking.

> The scripts are currently "vmstat" or "iostat" like, in that they

> print out

the parameters as they change over time. Â It might also be interesting (if 
someone has the perl-fu to do it) to have a "top" mode, where it resets the 
screen position each time and sorts the output from all of the clients.

>

>

>

> Cheers, Andreas

> --

> Andreas Dilger

> Lustre Technical Lead

> Oracle Corporation Canada Inc.

>

>

>

>

> -- --Wojciech Turek

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to