Hi Florian, Pretty sure you're going to need more data; Brendan's scripts are expecting *stack traces*, not just the latency numbers and the source process - the stacks are where the 'layers' in the flame graph visualizations come from.
If you're just collecting the disk latency numbers, and not the function hierarchy of the process, you might want to just use graphite or rrdtool or something to plot that data - it should be pretty understandable, but I don't think you'll have something as useful as the flame graph output might be. You really want to know *which part* of some process is jamming away at the disk - not just that it is happening, the processname, and when. [This sort of correlation - "what program feature on my system is making disk latency blow chunks" - is extremely useful and just beyond the edge of what most people's monitoring tools and approaches can deal with. There's a really good reason that the flame graph pages are littered with DTrace code... ;-) ] best, --e On Mon, Mar 10, 2014 at 9:33 PM, Florian Heigl <[email protected]> wrote: > Hi, > > I'm trying to get some dependable data on Nagios IO. > Nagios does a lot of disk IO, which is known, but there's no hard numbers to > it. > It gets especially for systems that _have_ best practices applied: > - rrdcached is running, volatile data is written to a RAM disk, etc. > > My current approach is using systemtap and collecting only write accesses > and their latencies. > > This I have, using the sys call to IO probe here: > https://sourceware.org/systemtap/examples/keyword-index.html#FILE > ...and grep, since I don't really understand all of it. > > To turn it into something more worthwhile that can be used by more people > and show results easily, > I want to use the flame graph thing as described at > http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html > > The whole toolkit seems be able to work with system tap. > > The problem: > I'm apparently just too stupid. I don't know how to get started. > I do not remotely grasp how to take the flamegraph git repo and the script I > have and make them do "something" > (something being, a sort on IO time spend per path element of the files > written to) > > Did any of you try something similar? > Did any of you work with flame graphs and can give some advice? > > > Florian > > _______________________________________________ > Discuss mailing list > [email protected] > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss > This list provided by the League of Professional System Administrators > http://lopsa.org/ > _______________________________________________ Discuss mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss This list provided by the League of Professional System Administrators http://lopsa.org/
