-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9647/
-----------------------------------------------------------

Review request for cloudstack and Hugo Trippaers.


Description
-------

In some storage failure scenario’s the NFS timeout can cause writing the 
heartbeat to take longer than expected. By comparing the last successful 
heartbeat epoch with the current epoch we check if the timeout value has been 
met.


Diffs
-----

  scripts/vm/hypervisor/xenserver/xenheartbeat.sh 5edacf7 

Diff: https://reviews.apache.org/r/9647/diff/


Testing
-------

Tested on hostxxx with an empty heartbeat file:
Feb 26 21:54:13 hostxxx heartbeat: Problem with heartbeat, no iSCSI or NFS 
mount defined in /opt/xensource/bin/heartbeat!

Tested on hostxxx with a 120 seconds timeout value by causing a storage 
failover (hits NFS timeout):
Feb 26 08:04:15 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-faecefb3-9ac0-47a2-b0fb-ae383762ba13:
 not reachable since 18 seconds
Feb 26 08:04:48 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-faecefb3-9ac0-47a2-b0fb-ae383762ba13:
 not reachable since 51 seconds
Feb 26 08:05:20 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/d392d770-330b-bdbf-9c07-e1c38af81c6e/hb-faecefb3-9ac0-47a2-b0fb-ae383762ba13:
 not reachable since 83 seconds
The storage failover stayed within the 120 seconds timeout value so no reboot

Tested on hostxxx with a 120 second timeout by removing the storage altogether 
(hits NFS timeout):
Feb 26 10:08:52 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 32 seconds
Feb 26 10:09:24 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 64 seconds
Feb 26 10:09:57 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 97 seconds
Feb 26 10:10:29 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 129 seconds
Feb 26 10:10:29 hostxxx heartbeat: Problem with /var/run/sr-mount/test/hb-test: 
not reachable since 129 seconds, rebooting system!

Tested on hostxxx with a 120 second timeout by removing write rights on the 
storage (does not hit NFS timeout):
Feb 26 10:22:13 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 5 seconds
Feb 26 10:22:18 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 10 seconds
Feb 26 10:22:23 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 15 seconds
Feb 26 10:22:28 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 20 seconds
Feb 26 10:22:33 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 25 seconds
Feb 26 10:22:38 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 30 seconds
Feb 26 10:22:43 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 35 seconds
Feb 26 10:22:48 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 40 seconds
Feb 26 10:22:53 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 45 seconds
Feb 26 10:22:58 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 50 seconds
Feb 26 10:23:03 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 55 seconds
Feb 26 10:23:08 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 60 seconds
Feb 26 10:23:13 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 65 seconds
Feb 26 10:23:18 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 70 seconds
Feb 26 10:23:23 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 75 seconds
Feb 26 10:23:28 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 80 seconds
Feb 26 10:23:33 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 85 seconds
Feb 26 10:23:38 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 90 seconds
Feb 26 10:23:43 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 95 seconds
Feb 26 10:23:48 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 100 seconds
Feb 26 10:23:53 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 105 seconds
Feb 26 10:23:58 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 110 seconds
Feb 26 10:24:03 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 115 seconds
Feb 26 10:24:08 hostxxx heartbeat: Potential problem with 
/var/run/sr-mount/test/hb-test: not reachable since 120 seconds
Feb 26 10:24:08 hostxxx heartbeat: Problem with /var/run/sr-mount/test/hb-test: 
not reachable for 120 seconds, rebooting system!


Thanks,

Brenn Oosterbaan

Reply via email to