Lars, you are right, and I saw that my guess to use /proc/stat was wrong. top is slow in getting the current CPU usage. So basically I wondered if you need the CPU usage at all. If you'd switch to "load", you could get it a lot faster.
To be honest: I wondered what "HealthCPU" would monitor about the CPU's "health" when initially looking into it. I was kind of disappointed to see that it simply inspects the CPU usage (A CPU that is 100% busy (0% idle) may be quite healthy) ;-) Regards, Ulrich >>> Lars Ellenberg <[email protected]> schrieb am 04.02.2011 um 13:26 in Nachricht <[email protected]>: > On Thu, Feb 03, 2011 at 01:09:04PM +0100, Michael Schwartzkopff wrote: > > On Thursday 03 February 2011 12:35:34 Ulrich Windl wrote: > > > Hi! > > > > > > I'm starting to explore Linux-HA. Examining one of the monitors, I think > > > things could be made much more efficient. For example: To get the percent > > > of idle CPU the monitor uses 4 processes: top -b -n2 | grep Cpu | tail -1 > > > | awk -F",|\.[0-9]%id" '{ print $4 }' > > > > > > However awk can do the effect of grep and tail as well. My first attempt > is > > > this: top -b -n2 | awk -F",|\.[0-9]%id" '/^Cpu/{ print $4; exit }' > > > > > > My second attempt uses /proc/stat instead, avoiding the slow top process: > > > awk '$1 == "cpu" { print $7; exit }' /proc/stat > > > > > > time (top -b -n2 | grep Cpu | tail -1 | awk -F",|\.[0-9]%id" '{ print $4 > > > }') awk: warning: escape sequence `\.' treated as plain `.' > > > 99 > > > > > > real 0m3.533s > > > user 0m0.008s > > > sys 0m0.008s > > > > > > time (top -b -n2| awk -F",|\.[0-9]%id" '/^Cpu/{ print $4; exit }') > > > awk: warning: escape sequence `\.' treated as plain `.' > > > 99 > > Outch. Big FAIL here already ;-) > > > > > > > real 0m0.518s > > > user 0m0.000s > > > sys 0m0.008s > > > > > > time awk '$1 == "cpu" { print $7; exit }' /proc/stat > > > 98 > > And you actually believe that this was the equivalent of the above > top | etc pipe ? > > See below. > > > > real 0m0.004s > > > user 0m0.000s > > > sys 0m0.000s > > > > > > Regards, > > > Ulrich > > > > Hi, > > > > good idea. The only problem it that the information in /proc/stats is in > ticks > > and does not give you an absolute value. So you would have to calculate the > > > difference yourself, which makes the task much more difficult. > > 1) /proc/stat is linux specific. > 2) /proc/stat is what top samles on linux ;-) > 3) it is in "USER_HZ", so it's in centi secs, > which makes it easy enough to calculate a meaningfull difference. > besides, as long as it is _any_ consistent unit, > the unit does not matter, as it is in both nominator and denominator > ;-) > > 4) time top -b -n2 | pipe > vs. time cat /proc/stats > ... > You do realize that to get any meaningfull measure about "current" > cpu usage, while the only measure readily available is "cpu usage > since system boot", you need to watch it for a while? > > $ strace -tt -T -e read,select top -b -n2 2>&1 1>/dev/null | > grep -Ee ' read[(][0-9]*, "cpu | select[(]' > 12:27:55.985830 read(3, "cpu 2146371 14 1345585 66007476"..., 8192) = > 1586 <0.000395> > 12:27:56.065141 select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout) > <0.500579> > 12:27:56.644309 read(5, "cpu 2146377 14 1345595 66007497"..., 1024) = > 1024 <0.000164> > 12:27:56.645109 select(0, NULL, NULL, NULL, {3, 0}) = 0 (Timeout) <3.002688> > 12:27:59.730617 read(5, "cpu 2146379 14 1345604 66007624"..., 1024) = > 1024 <0.000164> > > [lars@soda:~/DRBD/drbd-8.3]$ strace -tt -T -e read,select top -b -n2 -d 0.1 > 2>&1 > 1>/dev/null | > grep -Ee ' read[(][0-9]*, "cpu | select[(]' > 12:28:14.834497 read(3, "cpu 2146386 14 1345657 66008224"..., 8192) = > 1586 <0.000395> > 12:28:14.919021 select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout) > <0.500582> > 12:28:15.501349 read(5, "cpu 2146391 14 1345669 66008247"..., 1024) = > 1024 <0.000165> > 12:28:15.502149 select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout) > <0.100165> > 12:28:15.681910 read(5, "cpu 2146394 14 1345674 66008252"..., 1024) = > 1024 <0.000162> > > Notice something? (mind the timestamp, and the -d option...) > > Default delay of top on my setup seems to be 3 seconds. If you want the > same > "accuracy", then any replacement would need to sleep the same three seconds > between two samplings of /proc/stat. > > > So what you "measure" with your first variant (the print $4; exit) > is the first "estimate" of top, after sampling /proc/stat twice with > a hardcoded delay of 0.5 seconds, not waiting for the better, 3 seconds > sampling period. While this may be "good enough", it is certainly not > equivalent. > > You could drop the exit from awk, and do top -b -n1 to achieve the same. > > And exactly what do you think you measure with > awk '$1 == "cpu" { print $7; exit }' /proc/stat ? > column 7 is cumulative centiseconds spent in (hard)irq since system boot. > What would the HealthCPU agent do with that? > > What you would need to do is sample /proc/stat twice, with a delay of, > say, 1 to 3 seconds, calculate the relative cpu time spent in idle, > and get that calculation right. > That certainly can be done in bash, even from bash, > though to grab the "^cpu " line from stat I'll use grep anyways, > that's faster than trying a "while read a b c ... case a in "cpu "*) ..." > at least in my experience. > > #!/bin/bash > percent_cpu_spent_doing_stuff() > { > local delay=${1:-2} > set -- $(grep '^cpu ' < /proc/stat ); > shift; > local u=${1:-0} n=${2:-0} s=${3:-0} i=${4:-0} io=${5:-0} irq=${6:-0} > softirq=${7:-0} steal=${8:-0} guest=${9:-0} guest_nice=${10:-0}; > sleep $delay; > set -- $(grep '^cpu ' < /proc/stat ); > shift; > local sum0=$[u+n+s+i+io+irq+softirq+steal+guest+guest_nice]; > local sum1=$[${1:-0} + ${2:-0} + ${3:-0} + ${4:-0} + ${5:-0} + ${6:-0} + > ${7:-0} + ${8:-0} + ${9:-0} + ${10:-0}]; > local d=$[sum1-sum0]; > echo "$sum1 - $sum0 = $d"; > u=$[${1:-0} - u] n=$[${2:-0} - n] s=$[${3:-0} - s] i=$[${4:-0} - i]; > io=$[${5:-0} - io] irq=$[${6:-0} - irq] softirq=$[${7:-0} - softirq]; > steal=$[${8:-0} - steal] guest=$[${9:-0} - guest] guest_nice=$[${10:-0} - > guest_nice]; > local x; > for x in u n s i io irq softirq steal guest guest_nice; > do > printf "%12s = %3d%%\n" $x $[${!x}*100/d]; > done > } > > percent_cpu_spent_doing_stuff > > But why would we want to do that? > > Would it be sufficient to add a "delay" parameter to the HealthCPU agent, > and pass it to "top -b -n2 -d $delay"? So anyone who wants a completely > useless erratically fluctuating cpu usage measure can use delay=0.1, > and others can pass in delay=10 ? > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
