Soft Lockup/Clock Drift Test Setup
The basic setup used for testing for soft lockup and clock drift issues amongst
guests in our environment was to generate guest and host cpu load to induce
high scheduling latencies for guest vcpus. Guests were then monitored from a
seperate machine using a "heartbeat" monitor that records periodic heartbeat
messages from the guests and compares guest timestamp information to the local
timestamp to calculate clock drift.
Heartbeats that are late by a certain number of seconds, in our case configured
to be 10seconds, as well as heartbeats which have not been delivered for some
period of time, cause soft lockup alerts to be generated in the heartbeat
monitor's log file. It is important to note that other factors, such as network
issues, can account for these delayed/missed heartbeats. Actual data on # of
soft lockups should be collected from within each guest.
I only have got testcases, didn't automate it, could you help to review ?
Thanks Mike roath for providing required help.
Steps:
1. Install stress utility on host and guest
http://weather.ou.edu/~apw/projects/stress/stress-1.0.2.tar.gz
2. Setup test :
Setup heart beat monitor on both host and guest.
For ex:
./heartbeat_slu.py --server --threshold 10 --file
/tmp/heartbeat_server.out --verbose --check-drift
./heartbeat_slu.py --client --address <heartbeat server ip> --interval 1
3. Start the stress utility on each guest/guest configuration that is to be
tested by issuing the following command:
screen stress -c <num_threads>
where <num_threads> should be twice the number of vcpus allocated to the guest.
4) Start the stress utility on the host by issuing the following command:
screen stress -c <num threads>
5. Let this run for an extended period, May be more than 12 hours.
6) On each guest, tally the number of socket lockups that were logged.
Something like the following would tally up soft lockups that occurred on Mar
17 and Mar 18th, for example.
grep "soft lockup" /var/log/messages* | grep -E "Mar (17|18)"
7) For each guest, collect the drift statistics from the heartbeat monitor
server log. In our case, /tmp/heartbeat_server.out :
grep "my_guest_hostname" /tmp/heartbeat_server.out | grep drift | tail -1
Output will be of the form:
<local timestamp>: <guest hostname> <sequence number> <guest timestamp> (drift
<totaldrift> (<drift delta))
1269891443.99: my_guest_hostname 331434 1269891442.08 (drift +0.01 (-0.00))
<total drift> is the value we're most interested in from this line.
--Pradeep
_______________________________________________
Autotest mailing list
[email protected]
http://test.kernel.org/cgi-bin/mailman/listinfo/autotest