On Fri, 2011-06-17 at 14:16 +0530, Pradeep Kumar wrote: > Soft Lockup/Clock Drift Test Setup > > The basic setup used for testing for soft lockup and clock drift issues > amongst guests in our environment was to generate guest and host cpu load to > induce high scheduling latencies for guest vcpus. Guests were then monitored > from a seperate machine using a "heartbeat" monitor that records periodic > heartbeat messages from the guests and compares guest timestamp information > to the local timestamp to calculate clock drift. > > > Heartbeats that are late by a certain number of seconds, in our case > configured to be 10seconds, as well as heartbeats which have not been > delivered for some period of time, cause soft lockup alerts to be generated > in the heartbeat monitor's log file. It is important to note that other > factors, such as network issues, can account for these delayed/missed > heartbeats. Actual data on # of soft lockups should be collected from within > each guest. > > > I only have got testcases, didn't automate it, could you help to review ?
Sure > Thanks Mike roath for providing required help. > > Steps: > 1. Install stress utility on host and guest > http://weather.ou.edu/~apw/projects/stress/stress-1.0.2.tar.gz There's an autotest test that runs stress, so it could be reused. Now, it's also important to note that we could just run some shell one liners to produce similar cpu loads. The advantage of using stress as a stress generating program is that it can generate several types of stress (CPU, IO, network). I am not objecting against the use of this program, just pointing out we have other options. > 2. Setup test : > > Setup heart beat monitor on both host and guest. > For ex: > > ./heartbeat_slu.py --server --threshold 10 --file > /tmp/heartbeat_server.out --verbose --check-drift > > ./heartbeat_slu.py --client --address <heartbeat server ip> --interval 1 ^ Need to verify which TCP port is needed for process communication and ensure it'll be open. Also, do the logs on the client case go to syslog or what? > 3. Start the stress utility on each guest/guest configuration that is to be > tested by issuing the following command: > > screen stress -c <num_threads> > > where <num_threads> should be twice the number of vcpus allocated to the > guest. > > 4) Start the stress utility on the host by issuing the following command: > screen stress -c <num threads> > > 5. Let this run for an extended period, May be more than 12 hours. Ok, maybe we can have 3 trials on a reference configuration with 12, 24 and 48 hours and then analyze the results to see which one would be better to pick. > 6) On each guest, tally the number of socket lockups that were logged. > Something like the following would tally up soft lockups that occurred on Mar > 17 and Mar 18th, for example. > grep "soft lockup" /var/log/messages* | grep -E "Mar (17|18)" > > 7) For each guest, collect the drift statistics from the heartbeat monitor > server log. In our case, /tmp/heartbeat_server.out : > > grep "my_guest_hostname" /tmp/heartbeat_server.out | grep drift | tail -1 > > Output will be of the form: > <local timestamp>: <guest hostname> <sequence number> <guest timestamp> > (drift <totaldrift> (<drift delta)) > 1269891443.99: my_guest_hostname 331434 1269891442.08 (drift +0.01 (-0.00)) > <total drift> is the value we're most interested in from this line. Sounds reasonable. In case you plan on automating it, know that we also have methods in kvm autotest to evaluate the time drift, using simple clock timestamps or NTP based measurements. > > --Pradeep _______________________________________________ Autotest mailing list [email protected] http://test.kernel.org/cgi-bin/mailman/listinfo/autotest
