[ https://issues.apache.org/jira/browse/CLOUDSTACK-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sheng Yang updated CLOUDSTACK-1653: ----------------------------------- Fix Version/s: (was: 4.1.0) 4.2.0 > Redundant router: check_heartbeat.sh malfunction caused by delayed cron job > --------------------------------------------------------------------------- > > Key: CLOUDSTACK-1653 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-1653 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Affects Versions: 4.1.0 > Reporter: Sheng Yang > Assignee: Sheng Yang > Fix For: 4.2.0 > > > According to: https://bugzilla.redhat.com/show_bug.cgi?id=159441 > cron can only guarantee the minimum interval of execution jobs, so two check > of check_heartbeat.sh would possibly take more than 1 minutes. > Since keepalived should update keepalived.ts every 10 seconds, so if any of > two execution have gap less than 60 seconds, it should fail. > The current logic in the check_heartbeat.sh is wrong, which only guarantee > cron didn't delay, but not keepalived is alive. > This pass the original test because it was a NFS disconnecting test, in which > case disk is corrupted, so cron got delayed, means network is down. > Change the condition to less than 60(probably 30 is safer because seems > sometime cron has bug for not meeting the minimum interval requirement) > should works too. Because it should find out that keepalived is dead after > second time it was executed after NFS recovered. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira