>> Be careful here. You can certain stick some data into an rrd but >> certainly not all of it, especially if you want to collect a lot of >> it at a reasonable frequency. If you want accurate detail plots, >> you've gotta go to the data stored on each separate system. I just >> don't see any way around this, at least not yet... >> > > Yes, you're absolutely right. Given its intrinsic multi-scale nature, a > RRD is well suited for keeping historical data on large time scales. > This could allow a very convenient graphical overview of the different > system metrics, but would be pointless for debugging purposes, where > you do need fine-grained data. That's where collectl is the most useful > for me. > > But what about both? I don't see any reason why collectl couldn't > provide high-frequency accurate data to diagnose problems locally, and > at the same time allow to aggregate less precise values in RRD for > global visualization of multi-hosts systems. > I agree 1000%... The mental model I've been building in my head is to tell collectl to log its data locally and also write out an s-expression using --sexpr. Then a daemon can periodically pick out the data its interested at whatever frequency it's interested in and forward it on up the line. >> As a final note, I've put together a tutorial on using collectl in a >> lustre environment and have upload a preliminary copy at >> http://collectl.sourceforge.net/Tutorial-Lustre.html in case anyone >> wants to preview it before I link it into the documentation. >> If nothing else, look at my very last example where I show what you >> can see by monitoring lustre at the same time as your network >> interface. >> > > Very good, thanks for this. The readahead experiment is insightful. > It was to me when I first encountered the problem. >> Did I also mention that collectl is probably one of the few tools >> that can monitor your Infiniband traffic as well? >> > > That's why it rocks. :) > > Now the only thing which still make me want to use other monitoring > software is the ability to get a global view. Centralized data > collection and easy graphing (RRD feeding) are still what I need most > of the time. > I hear you here too. That's the main reason I put in the ability to generate data in plottable format. That's as close as I'm willing to go with providing a graphing capability in collectl itself. I'm trying real hard to bound its scope as I figure it already has more than enough switches... 9-) -mark
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
