[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626313#comment-14626313
 ] 

ASF GitHub Bot commented on CLOUDSTACK-8616:
--------------------------------------------

GitHub user wilderrodrigues opened a pull request:

    https://github.com/apache/cloudstack/pull/587

    CLOUDSTACK-8616: Redundant VPR with both routers as Master

    This PR contains some refactoring of the Python code used by the redundant 
routers and also a fix for the intermittent problem when running the rVPC 
component tests.
    
    To summarise it:
    
    * If the KeepaloiveD configuration file changes, restart the service 
instead of reloading it.
    * Since we are configuring KeepaliveD/VRRP in no-preemptive mode, we no 
longer need priorities. As a matter of fact, the Management Server was not 
sending priorities to the routers anymore. The value used in the old 
configuration was defaulted to 99 in the Python code.
    * KeepaliveD and ConntractD, once configured in a router, will have a 
cronjob that will run on reboot. So, the services will be restarted without the 
need to wait for the management server to send some configuration and force a 
restart.
    * Installing KeepaliveD from Wheezy-Backports in order to have a newer 
version available.
    
    I already squashed few commits of this PR so we wouldn't have to go through 
simple fixes/typos that happened during the tryouts. When opening the commits 
for review please note that the commit messages also contain the messages of 
the squashed commits.
    
    Adding the cronjob to restart the KeepaliveD service on reboot helped to 
get a 60% success rate with the tests. Before that, the tests were failing very 
often: 4 out of 5 times.
    
    I then added the "restart" when configuration changes instead of "reload". 
Once the change was applied, I successfully executed the tests 13 times. That 
gives confidence.
    
    Tests can be executed with the following command:
    
    nosetests --with-marvin --marvin-config=[your_configuration_file] -s -a 
tags=advanced,required_hardware=true component/test_vpc_redundant.py
    
    Since there were changes on marvin/base.py - in the previous PR, you will 
need to build/upgrade your Marvin installation.
    
    @DaanHoogland @bhaisaab @remibergsma, could you please have a look at this 
PR?
    
    Cheers,
    Wilder


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/schubergphilis/cloudstack fix/CLOUDSTACK-8616

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/cloudstack/pull/587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #587
    
----
commit c35c6661696ab3c3c1ddfb6794bd293a76b2463b
Author: wilderrodrigues <[email protected]>
Date:   2015-07-08T05:24:35Z

    CLOUDSTACK-8616 - Removing the Priority form KeepaliveD configuration
    
       - We use no preempt mode with state set as EQUAL to both nodes, no need 
to have Priotities setup
       - Do not add IPs as comments to the configuration. If a new guest 
interface is added, the file will change anyway.
         - This was used in the past when keepalived would restart for each new 
interface added
       - Removed the long sleep form the tests: we now sleep 5 seconds per PF 
rule added
    
    CLOUDSTACK-8616 - Fix keepalived.ts/2 files comparison
    
       - Add call to set_fault() in case of router transits to that state
       - Removing commented out code
    
    CLOUDSTACK-8616 - Fixing check_heartbeat.sh.templ
    
    CLOUDSTACK-8616 - Call set_fault from the check_heartbeat.sh script

commit c975185318cbfd00e9d5e346b4fc9ea2c76e8098
Author: wilderrodrigues <[email protected]>
Date:   2015-07-09T09:40:32Z

    CLOUDSTACK-8616 - Add keepalived start on reboot
    
       - Runs check_heartbeat.sh every 30 seconds
    
    CLOUDSTACK-861 - Copy/Paste error
    
       - Paste the wrong command in the crontab line.

commit c20b5f3ff1e56b4db296bd2ec46f0cd8ed538b29
Author: wilderrodrigues <[email protected]>
Date:   2015-07-10T06:41:28Z

    CLOUDSTACK-8616 - Installing KeepaliveD from Debian Wheezy backports
    
       - preempt delay reverted on version 1.2.13 - from the backports
         - vrrp : Revert "Honor preempt_delay setting on startup.".
         - See changelog: http://www.keepalived.org/changelog.html
       - Refactoring some variable names to avoid misunderstanding

commit 118d7b79f4f5f15a9931ff1cc6e2cc91a562ee11
Author: wilderrodrigues <[email protected]>
Date:   2015-07-13T17:29:41Z

    CLOUDSTACK-8616 - Add a cron job to restart ConntrackD on reboot

----


> Redundant VPR with both routers as Master
> -----------------------------------------
>
>                 Key: CLOUDSTACK-8616
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8616
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Virtual Router
>    Affects Versions: 4.6.0
>            Reporter: Wilder Rodrigues
>            Assignee: Wilder Rodrigues
>
> There is an intermittent problem with the keepalived on the redundant VPC 
> routers. Sometimes both routers stay on Master state for a while.
> We are able to reproduce it only when testing with Marvin, which executes the 
> calls very quick. When using the UI and following the same steps, it doesn't 
> happen.
> Setting up:
> 1. Create a VPC using redundant VPC offering
> 2. Add 2 Tiers
> 3. Create 2 VMs in each Tier
> 4. Create ACLs to allow traffic on port 22 coming from 0.0.0.0/0
> 5. Acquire 4 public IPs
> 6. Create Port Forwarding rules - per IP - for port 22
> 7. Assign each PF created to one of the VMs
> 8. SSH to the VMs
> Testing fail over:
> 1. Stop the Master Router
> 2. Check the the Backup Router became Master
> 3. SSH to the VMs 
> Testing failure:
> 1. Delete all port forwarding rules
> 2. SSH to the VMs 
> 3. Verify that it no longer works
> Test recovering
> 1. Restart the router
> 2. Once the router is running, check that it's on Backup state
> 3. Add the port forwarding rules back
> 4. Verify that the routers are still on the same state: 1 Master and 1 Backup
>     - That's the part when it fails during the Marvin tests
>     - When 2 routers are on Master, by restarting 1 router will bring 
> everything to a normal state: 1 master and 1 backup
> 5. SSH to the VMs 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to