[
https://issues.apache.org/jira/browse/CLOUDSTACK-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626464#comment-14626464
]
ASF GitHub Bot commented on CLOUDSTACK-8616:
--------------------------------------------
Github user bhaisaab commented on a diff in the pull request:
https://github.com/apache/cloudstack/pull/587#discussion_r34579656
--- Diff:
systemvm/patches/debian/config/opt/cloud/templates/check_heartbeat.sh.templ ---
@@ -47,13 +47,14 @@ then
if [ $s -gt 2 ]
then
echo Keepalived process is dead! >> $ROUTER_LOG
- $ROUTER_BIN_PATH/services.sh stop >> $ROUTER_LOG 2>&1
- $ROUTER_BIN_PATH/disable_pubip.sh >> $ROUTER_LOG 2>&1
- $ROUTER_BIN_PATH/primary-backup.sh fault >> $ROUTER_LOG 2>&1
service keepalived stop >> $ROUTER_LOG 2>&1
service conntrackd stop >> $ROUTER_LOG 2>&1
- pkill -9 keepalived >> $ROUTER_LOG 2>&1
- pkill -9 conntrackd >> $ROUTER_LOG 2>&1
+
+ #Set fault so we have the same effect as a KeepaliveD fault.
+ python /opt/cloud/bin/master.py --fault
--- End diff --
will running the script here block this shell script until it exists, or
will it launch as a daemon? If it's blocking then it's alright.
> Redundant VPC with both routers as Master
> -----------------------------------------
>
> Key: CLOUDSTACK-8616
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-8616
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Virtual Router
> Affects Versions: 4.6.0
> Reporter: Wilder Rodrigues
> Assignee: Wilder Rodrigues
>
> There is an intermittent problem with the keepalived on the redundant VPC
> routers. Sometimes both routers stay on Master state for a while.
> We are able to reproduce it only when testing with Marvin, which executes the
> calls very quick. When using the UI and following the same steps, it doesn't
> happen.
> Setting up:
> 1. Create a VPC using redundant VPC offering
> 2. Add 2 Tiers
> 3. Create 2 VMs in each Tier
> 4. Create ACLs to allow traffic on port 22 coming from 0.0.0.0/0
> 5. Acquire 4 public IPs
> 6. Create Port Forwarding rules - per IP - for port 22
> 7. Assign each PF created to one of the VMs
> 8. SSH to the VMs
> Testing fail over:
> 1. Stop the Master Router
> 2. Check the the Backup Router became Master
> 3. SSH to the VMs
> Testing failure:
> 1. Delete all port forwarding rules
> 2. SSH to the VMs
> 3. Verify that it no longer works
> Test recovering
> 1. Restart the router
> 2. Once the router is running, check that it's on Backup state
> 3. Add the port forwarding rules back
> 4. Verify that the routers are still on the same state: 1 Master and 1 Backup
> - That's the part when it fails during the Marvin tests
> - When 2 routers are on Master, by restarting 1 router will bring
> everything to a normal state: 1 master and 1 backup
> 5. SSH to the VMs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)