Soft Lockup/Clock Drift Test Setup

The basic setup used for testing for soft lockup and clock drift issues amongst 
guests in our environment was to generate guest and host cpu load to induce 
high scheduling latencies for guest vcpus. Guests were then monitored from a 
seperate machine using a "heartbeat" monitor that records periodic heartbeat 
messages from the guests and compares guest timestamp information to the local 
timestamp to calculate clock drift.


Heartbeats that are late by a certain number of seconds, in our case configured 
to be 10seconds, as well as heartbeats which have not been delivered for some 
period of time, cause soft lockup alerts to be generated in the heartbeat 
monitor's log file. It is important to note that other factors, such as network 
issues, can account for these delayed/missed heartbeats. Actual data on # of 
soft lockups should be collected from within each guest.


I only have got testcases, didn't automate it, could you help to review ?
Thanks Mike roath for providing required help. 

Steps:
1.  Install stress utility on host and guest 
        http://weather.ou.edu/~apw/projects/stress/stress-1.0.2.tar.gz
2. Setup test :

        Setup heart beat monitor on both host and guest. 
        For ex: 

     ./heartbeat_slu.py --server --threshold 10 --file 
/tmp/heartbeat_server.out --verbose --check-drift

     ./heartbeat_slu.py --client --address <heartbeat server ip> --interval 1

3.  Start the stress utility on each guest/guest configuration that is to be 
tested by issuing the following command:

screen stress -c <num_threads> 

where <num_threads> should be twice the number of vcpus allocated to the guest.

4) Start the stress utility on the host by issuing the following command:
screen stress -c <num threads>

5. Let this run for an extended period, May be more than 12 hours. 

6) On each guest, tally the number of socket lockups that were logged. 
Something like the following would tally up soft lockups that occurred on Mar 
17 and Mar 18th, for example.
grep "soft lockup" /var/log/messages* | grep -E "Mar (17|18)"

7) For each guest, collect the drift statistics from the heartbeat monitor 
server log. In our case, /tmp/heartbeat_server.out :

grep "my_guest_hostname" /tmp/heartbeat_server.out | grep drift | tail -1

Output will be of the form:
<local timestamp>: <guest hostname> <sequence number> <guest timestamp> (drift 
<totaldrift> (<drift delta))
1269891443.99: my_guest_hostname 331434 1269891442.08 (drift +0.01 (-0.00))
<total drift> is the value we're most interested in from this line.


--Pradeep
_______________________________________________
Autotest mailing list
[email protected]
http://test.kernel.org/cgi-bin/mailman/listinfo/autotest

Reply via email to