[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sheng Yang updated CLOUDSTACK-1653:
-----------------------------------

    Fix Version/s:     (was: 4.1.0)
                   4.2.0
    
> Redundant router: check_heartbeat.sh malfunction caused by delayed cron job
> ---------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-1653
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-1653
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>    Affects Versions: 4.1.0
>            Reporter: Sheng Yang
>            Assignee: Sheng Yang
>             Fix For: 4.2.0
>
>
> According to: https://bugzilla.redhat.com/show_bug.cgi?id=159441
> cron can only guarantee the minimum interval of execution jobs, so two check 
> of check_heartbeat.sh would possibly take more than 1 minutes.
> Since keepalived should update keepalived.ts every 10 seconds, so if any of 
> two execution have gap less than 60 seconds, it should fail. 
> The current logic in the check_heartbeat.sh is wrong, which only guarantee 
> cron didn't delay, but not keepalived is alive. 
> This pass the original test because it was a NFS disconnecting test, in which 
> case disk is corrupted, so cron got delayed, means network is down.
> Change the condition to less than 60(probably 30 is safer because seems 
> sometime cron has bug for not meeting the minimum interval requirement) 
> should works too. Because it should find out that keepalived is dead after 
> second time it was executed after NFS recovered.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to