Re: Continuous TCK Testing

Jason Dillon Sat, 18 Oct 2008 01:26:00 -0700

Before when I had those 2 build machines running in my apartment inberkeley, I setup one xen domain specifically for running monitoringtools, and installed cacti on it, and then setup snmpd on each of theother machines configured to allow access from the xen monitoringdomain. This provided a very detail easy to grok monitoring consolefor the build agents.


--jason



On Oct 18, 2008, at 5:58 AM, Jay D. McHugh wrote:

Hey Kevan,

Regarding monitoring...

I managed to run into xenmon.py.
It appears to log the system utilization for the whole box as wellas each
VM to log files in 'your' home directory if you specify the '-n' flag.

Here is the help page for xenmon.py:
[EMAIL PROTECTED]:~$ sudo python /usr/sbin/xenmon.py -h
Usage: xenmon.py [options]

Options:
 -h, --help            show this help message and exit
-l, --live show the ncurses live monitoring frontend(default)
 -n, --notlive         write to file instead of live monitoring
 -p PREFIX, --prefix=PREFIX
                       prefix to use for output files
 -t DURATION, --time=DURATION
stop logging to file after this much time haselapsed(in seconds). set to 0 to keep loggingindefinitely
 -i INTERVAL, --interval=INTERVAL
                       interval for logging (in ms)
 --ms_per_sample=MSPERSAMPLE
determines how many ms worth of data goes ina sample
 --cpu=CPU             specifies which cpu to display data for
 --allocated           Display allocated time for each domain
 --noallocated         Don't display allocated time for each domain
 --blocked             Display blocked time for each domain
 --noblocked           Don't display blocked time for each domain
 --waited              Display waiting time for each domain
 --nowaited            Don't display waiting time for each domain
 --excount             Display execution count for each domain
 --noexcount           Don't display execution count for each domain
 --iocount             Display I/O count for each domain
 --noiocount           Don't display I/O count for each domain

And here is some sample output:

[EMAIL PROTECTED]:~$ cat log-dom0.log
# passed cpu dom cpu(tot) cpu(%) cpu/ex allocated/ex blocked(tot)blocked(%) blocked/io waited(tot) waited(%) waited/ex ex/s io(tot)io/ex0.000 0 0 2.086 0.000 38863.798 30000000.000 154.177 0.000 0.0000.504 0.000 9383.278 0.000 0.000 0.0002.750 1 0 2.512 0.000 53804.925 30000000.000 153.217 0.000 0.0000.316 0.000 6774.813 0.000 0.000 0.0004.063 2 0 2.625 0.000 59959.942 30000000.000 153.886 0.000 0.0000.173 0.000 3939.987 0.000 0.000 0.0005.203 3 0 3.020 0.000 47522.430 30000000.000 171.834 0.000 0.0000.701 0.000 11031.759 0.000 0.000 0.0006.403 4 0 2.130 0.000 39256.871 30000000.000 171.870 0.000 0.0000.617 0.000 11378.014 0.000 0.000 0.0009.230 6 0 0.836 0.000 53962.875 30000000.000 57.287 0.000 0.0000.038 0.000 2450.488 0.000 0.000 0.00010.305 7 0 2.171 0.000 46119.247 30000000.000 154.008 0.000 0.0000.367 0.000 7804.444 0.000 0.000 0.00011.518 0 0 15931680.822 1.593 54019.023 30000000.000 889706824.19188.971 0.000 2630292.436 0.263 8918.446 294.927 0.000 0.0001009.216 1 0 7687035.544 0.769 53822.548 30000000.000 473101345.00447.310 0.000 864964.568 0.086 6056.248 142.822 0.000 0.0001010.199 2 0 20502235.224 2.050 61655.293 30000000.000 979188763.75497.919 0.000 4279443600.516 427.944 12869345.608 332.530 0.000 0.0001011.239 3 0 13634865.766 1.363 45934.870 30000000.000 985479796.36398.548 0.000 1593248.596 0.159 5367.538 296.830 0.000 0.0001012.312 4 0 18228049.181 1.823 61242.790 30000000.000 979822521.39697.982 0.000 2593364.560 0.259 8713.213 297.636 0.000 0.0001013.338 5 0 9891757.872 0.989 65386.046 30000000.000 571275802.79457.128 0.000 357431.539 0.036 2362.678 151.282 0.000 0.000
We could probably add a cron job to grab a single sample every Xminutesand append them together to build up a utilization history (ratherthan
simply running it all of the time).
I just tried to get a single sample and the smallest run I could getwas
about three seconds with four samples taken.

Or, I also tried xentop in batch mode:

[EMAIL PROTECTED]:~$ sudo xentop -b -i 1
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k)MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RDVBD_WR SSIDDomain-0 -----r 430567 0.0 3939328 23.5 nolimit n/a 8 4 0 0 0 00 0 2149631536tck01 --b--- 750449 0.0 3145728 18.8 314572818.8 2 1 483054 1855493 1 15 655667 84458292149631536tck02 --b--- 1101273 0.0 3145728 18.8 314572818.8 2 1 367792 1773407 1 83 1131709 90306632149631536tck03 -----r 144552 0.0 3145728 18.8 314572818.8 2 1 188115 2370069 1 6 370431 12906832149631536tck04 --b--- 103742 0.0 3145728 18.8 314572818.8 2 1 286936 2341941 1 7 381523 14844762149631536
It looks to me like having a cron job that periodically ran xentop and
build up a history would be the best option (without digging through
a ton of different specialized monitor packages).


Jay

Kevan Miller wrote:
On Oct 10, 2008, at 11:29 AM, Kevan Miller wrote:
On Oct 10, 2008, at 11:25 AM, Kevan Miller wrote:
On Oct 8, 2008, at 11:56 PM, Kevan Miller wrote:
On Oct 8, 2008, at 4:31 PM, Jason Warner wrote:
We had some suggestions earlier for some alternate means of
implementing this (Hudson, Conitnuum, etc...). Now that we'vehad
Jason Dillon provide an overview of what we had in place before,
does anyone have thoughts on what we should go with? I'mthinking
we should stick with the AHP based solution.  It will need to be
updated most likely, but it's been tried and tested and shown to
meet our needs.  I'm wondering, though, why we stopped using it
before.  Was there a specific issue we're going to have to deal
with again?
IIRC, the overwhelming reason we stopped using it before wasbecauseof hosting issues -- spotty networking, hardware failures, poorcolosupport, etc. We shouldn't have any of these problems, now. Ifwe do
run into problems, they should now be fixable. I have no reason to
favor Hudson/Continuum over AHP. So, if we can get AHP running
easily, I'm all for it. There's only one potential issue, that I'm
aware of.

We previously had an Open Source License issued for our use of
Anthill. Here's some of the old discussion --
http://www.nabble.com/Geronimo-build-automation-status-(longish)-tt7649902.html#a7649902


Although the board was aware of our usage of AntHill, since we
weren't running AntHill on ASF hardware, I'm not sure the license
was fully vetted by Infra. I don't see any issues, but I'll wantto
run this by Infra.
Jason D, will the existing license cover the version of AntHillthat
we'll want to use? I'll run the license by Infra and will also
describe the issue for review by the Board, in our quarterlyreport.
Heh. Oops. Just noticed that I sent the following to myself and notthe
dev list. I hate when I do that...
One more thing... from emails on [EMAIL PROTECTED] lookslike
Infra is cool with us running Anthill on selene and phoebe.
BTW, am planning on installing monitoring software over theweekend onselene and phoebe. The board is interested in monitoring ourusage...
Also, we now have a new AntHill license for our use. I've placed the
license in ~kevan/License2.txt on phoebe and selene. This licenseshould
only be used for Apache use. So, should not be placed in a public
location (e.g.  our public svn tree).
Regarding monitoring software -- I haven't been able to get it toworkyet. vmstat/iostat don't work, unless you run on every virtualmachine.
'xm top' gathers data on all domains, however, doesn't make the data
easy to tuck away in a log file/available to snmp... Advicewelcome...
--kevan

Re: Continuous TCK Testing

Reply via email to