O_O. Good to know! On 2/26/14, 11:58 PM, "Marty Sweet" <msweet....@gmail.com> wrote:
>Hi Guys, > >Does anyone have any ideas about this? >My main concern is the KVM resource collector and I assume the other >hypervisor setups are receiving the wrong values. > >The CSAgent periodically runs the following command: > [kvm.resource.LibvirtComputingResource] >(agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n >1|grep Cpu\(s\):|cut -d% -f4|cut -d, -f2);echo $idle > >=== >When running top manually I get the same results (2 secs after each >command): >====== >root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu >Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 92.4%id, 0.2%wa, 0.0%hi, 0.0%si, >0.0%st >root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu >Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 92.4%id, 0.2%wa, 0.0%hi, 0.0%si, >0.0%st >root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu >Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 92.4%id, 0.2%wa, 0.0%hi, 0.0%si, >0.0%st >root@aurora:/var/log/cloudstack/agent# top -b -n2 | grep Cpu >Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 92.4%id, 0.2%wa, 0.0%hi, 0.0%si, >0.0%st >Cpu(s): 29.2%us, 1.1%sy, 0.0%ni, 69.1%id, 0.5%wa, 0.0%hi, 0.1%si, >0.0%st >======= >Apparently this is because: >"This is because top, vmstat, iostat all in their first run collect >data since the last reboot time of the system. >And the successive iterations run on the sampling period that you >specify. So, in the first run of top, you will see the %idle time >because from the time of reboot to the time of running top, it was >that much % idle. But in next iterations, since it is busy it doesn't >show any %idle. >Exclude the first iteration and try sampling over the interval you want." >http://serverfault.com/questions/436446/top-showing-64-idle-on-first-scree >n-or-batch-run-while-there-is-no-idle-time-a >======== > >Wouldn't this result in Cloudstack-Agent getting the wrong idle value >for the system? > >This hasn't been fixed in 4.3.0, so I will create a patch along the >following lines (if others agree): >/bin/bash -c idle=$(top -d0.10 -b -n 2|grep Cpu\(s\):|tail -n1|cut -d% >-f4|cut -d, -f2;echo $idle >-> Where top -d0.10, changes the refresh interval so the command is >faster to complete. >-> tail -n1, get's the last line of the output (the latest idle value) >=== > > >Let me know what you think, >Regards, >Marty > > >---------- Forwarded message ---------- >From: Marty Sweet <msweet....@gmail.com> >Date: Sun, Feb 23, 2014 at 1:20 PM >Subject: Segfault: Top & Sampling Rates >(kvm.resource.LibvirtComputingResource) >To: "dev@cloudstack.apache.org" <dev@cloudstack.apache.org> >Cc: "us...@cloudstack.apache.org" <us...@cloudstack.apache.org> > > >Hi, > >I have just noticed the occasional following error messages in kern.log. >This is happening on all but 1 of my nodes. Is anyone else >experiencing this issue? >===== >Feb 23 06:53:24 aurora kernel: [10185338.400091] top[27631]: segfault >at 0 ip 00007f025eba3315 sp 00007fff3f9ed308 error 4 in >libc-2.15.so[7f025ea6f000+1b5000] >===== > >I happened to have one of the nodes in trace mode, showing >cloudstack-agent is starting it: > >/var/log/cloudstack/agent/agent.log >====== >2014-02-23 06:53:23,654 DEBUG [kvm.resource.LibvirtComputingResource] >(agentRequest-Handler-1:null) Executing: /bin/bash -c idle=$(top -b -n >1|grep Cpu\(s\):|cut -d% -f4|cut -d, -f2);echo $idle >2014-02-23 06:53:23,661 DEBUG [kvm.resource.LibvirtComputingResource] >(agentRequest-Handler-2:null) Executing: /bin/bash -c idle=$(top -b -n >1|grep Cpu\(s\):|cut -d% -f4|cut -d, -f2);echo $idle >====== > >## This lead me on to find the following (potential) bug: > >When running this manually I get the same result (2 secs before each >command): >====== >root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu >Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 92.4%id, 0.2%wa, 0.0%hi, 0.0%si, >0.0%st >root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu >Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 92.4%id, 0.2%wa, 0.0%hi, 0.0%si, >0.0%st >root@aurora:/var/log/cloudstack/agent# top -b -n1 | grep Cpu >Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 92.4%id, 0.2%wa, 0.0%hi, 0.0%si, >0.0%st >root@aurora:/var/log/cloudstack/agent# top -b -n2 | grep Cpu >Cpu(s): 6.3%us, 1.0%sy, 0.0%ni, 92.4%id, 0.2%wa, 0.0%hi, 0.0%si, >0.0%st >Cpu(s): 29.2%us, 1.1%sy, 0.0%ni, 69.1%id, 0.5%wa, 0.0%hi, 0.1%si, >0.0%st >======= >Apparently this is because: >"This is because top, vmstat, iostat all in their first run collect >data since the last reboot time of the system. >And the successive iterations run on the sampling period that you >specify. So, in the first run of top, you will see the %idle time >because from the time of reboot to the time of running top, it was >that much % idle. But in next iterations, since it is busy it doesn't >show any %idle. >Exclude the first iteration and try sampling over the interval you want." >http://serverfault.com/questions/436446/top-showing-64-idle-on-first-scree >n-or-batch-run-while-there-is-no-idle-time-a >======== > >Wouldn't this result in Cloudstack-Agent getting the wrong idle value >for the system? > >If this hasn't been fixed in 4.3.0, I will create a patch along the >following lines (if others agree): >/bin/bash -c idle=$(top -d0.01 -b -n 2|grep Cpu\(s\):|tail -n1|cut -d% >-f4|cut -d, -f2;echo $idle >-> Where top -d0.01, changes the refresh interval so the command is >faster to complete. >-> tail -n1, get's the last line of the output (the latest idle value) > >Ubuntu 12.04 / KVM / CS 4.2.0 >Linux aurora 3.5.0-34-generic #55~precise1-Ubuntu SMP Fri Jun 7 >16:25:50 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux > >Thanks, >Marty > > >-- >Marty