[
https://issues.apache.org/jira/browse/CLOUDSTACK-4199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13742042#comment-13742042
]
venkata swamybabu budumuru commented on CLOUDSTACK-4199:
--------------------------------------------------------
I have also seen this issue every time during failover. Mentioned below are
step to reproduce:
1. 1 advanced zone with KVM cluster (2 KVM hosts)
2. Create an offering with RVR enabled.
*************************** 15. row ***************************
id: 15
name: RVR
uuid: 4e91c49f-5870-43e1-9865-0a84cd7b72ae
unique_name: RVR
display_text: RVR
nw_rate: NULL
mc_rate: 10
traffic_type: Guest
tags: NULL
system_only: 0
specify_vlan: 0
service_offering_id: NULL
conserve_mode: 1
created: 2013-08-16 05:05:34
removed: NULL
default: 0
availability: Optional
dedicated_lb_service: 1
shared_source_nat_service: 0
sort_key: 0
redundant_router_service: 1 =========> RVR is enabled
state: Enabled
guest_type: Isolated
elastic_ip_service: 0
eip_associate_public_ip: 0
elastic_lb_service: 0
specify_ip_ranges: 0
inline: 0
is_persistent: 1 =====> Persistent is enabled.
internal_lb: 0
public_lb: 1
egress_default_policy: 1
concurrent_connections: NULL
15 rows in set (0.00 sec)
3. As a non-ROOT domain user, try to deploy a VM using the above network
offering.
non-ROOT domain user info :
username : dom1User1
password : password
domain : dom1
*************************** 20. row ***************************
id: 220
name: swamyRVRNetwork
uuid: 215f3f85-dca2-45e4-9cab-607654677575
display_text: swamyRVRNetwork
traffic_type: Guest
broadcast_domain_type: Vlan
broadcast_uri: vlan://908
gateway: 10.1.1.1
cidr: 10.1.1.0/24
mode: Dhcp
network_offering_id: 15
physical_network_id: 200
data_center_id: 1
guru_name: ExternalGuestNetworkGuru
state: Implemented
related: 220
domain_id: 2
account_id: 3
dns1: NULL
dns2: NULL
guru_data: NULL
set_fields: 0
acl_type: Account
network_domain: cs3auto.advanced
reservation_id: c81b7838-db46-4d54-a5ed-4f6261802fb6
guest_type: Isolated
restart_required: 0
created: 2013-08-16 07:30:48
removed: NULL
specify_ip_ranges: 0
vpc_id: NULL
ip6_gateway: NULL
ip6_cidr: NULL
network_cidr: NULL
display_network: 1
network_acl_id: NULL
*************************** 48. row ***************************
id: 48
name: VM1Swamy
uuid: 6bfe2221-74b7-4de6-9b46-ae2f5ea1a661
instance_name: i-3-48-QA
state: Running
vm_template_id: 202
guest_os_id: 112
private_mac_address: 02:00:68:99:00:03
private_ip_address: 10.1.1.23
pod_id: 1
data_center_id: 1
host_id: 2
last_host_id: 2
proxy_id: NULL
proxy_assign_time: NULL
vnc_password: WFdUuz6e2W97XHGv7YnHc/8b0BH/HqK3eWpX3zxP97U=
ha_enabled: 0
limit_cpu_use: 0
update_count: 3
update_time: 2013-08-16 07:35:17
created: 2013-08-16 07:33:25
removed: NULL
type: User
vm_type: User
account_id: 3
domain_id: 2
service_offering_id: 2
reservation_id: 3baf28f3-745b-4dad-8fe9-8bab92bec033
hypervisor_type: KVM
disk_offering_id: NULL
cpu: NULL
ram: NULL
owner: 3
speed: 1000
host_name: VM1Swamy
display_name: VM1Swamy
desired_state: NULL
dynamically_scalable: 0
display_vm: 1
4. The above steps deployed RVR routers without any issues
*************************** 46. row ***************************
id: 46
name: r-46-QA =====================================> This
became MASTER
uuid: d044fae3-316e-4546-b832-ab9e12b074a3
instance_name: r-46-QA
state: Stopped
vm_template_id: 3
guest_os_id: 15
private_mac_address: 0e:00:a9:fe:01:69
private_ip_address: 169.254.1.105
pod_id: 1
data_center_id: 1
host_id: NULL
last_host_id: 3
proxy_id: NULL
proxy_assign_time: NULL
vnc_password: eMTnIdbchG5GWMGzs5awGTGs4M7LuYjmLBlmCMMBLSw=
ha_enabled: 0
limit_cpu_use: 0
update_count: 5
update_time: 2013-08-16 07:41:43
created: 2013-08-16 07:30:48
removed: NULL
type: DomainRouter
vm_type: DomainRouter
account_id: 3
domain_id: 2
service_offering_id: 7
reservation_id: c70dbe54-8f26-40c0-a111-720b77d4a2c1
hypervisor_type: KVM
disk_offering_id: NULL
cpu: NULL
ram: NULL
owner: NULL
speed: NULL
host_name: NULL
display_name: NULL
desired_state: NULL
dynamically_scalable: 0
display_vm: 1
*************************** 47. row ***************************
id: 47
name: r-47-QA =====================================> This
became BACKUP
uuid: 49080daa-3a00-4967-94cf-594b42375e6e
instance_name: r-47-QA
state: Running
vm_template_id: 3
guest_os_id: 15
private_mac_address: 0e:00:a9:fe:03:8d
private_ip_address: 169.254.3.141
pod_id: 1
data_center_id: 1
host_id: 2
last_host_id: 2
proxy_id: NULL
proxy_assign_time: NULL
vnc_password: UZ483zh1Nq/Ydq2mg1/v4I7mRqaSShk6vd6tWx84rQI=
ha_enabled: 0
limit_cpu_use: 0
update_count: 7
update_time: 2013-08-16 07:33:24
created: 2013-08-16 07:30:49
removed: NULL
type: DomainRouter
vm_type: DomainRouter
account_id: 3
domain_id: 2
service_offering_id: 7
reservation_id: 4ad79ebb-7c77-43b4-add2-fd3669d94d2f
hypervisor_type: KVM
disk_offering_id: NULL
cpu: NULL
ram: NULL
owner: NULL
speed: NULL
host_name: NULL
display_name: NULL
desired_state: NULL
dynamically_scalable: 0
display_vm: 1
5. Stop the MASTER VR from CloudStack
Observations:
(i) MASTER router went into stopped state successfully but, BACKUP router stuck
in "FAULT" state forever.
Here is the snippet of keepalived.log for FAULT router
root@r-47-QA:~# cat /ramdisk/rrouter/keepalived.log
To backup called
Disable public ip 0
Password server is not running
Stopping DNS forwarder and DHCP server: dnsmasq(not running) ... (warning).
cache internal:
current active connections: 0
connections created: 0 failed: 0
connections updated: 0 failed: 0
connections destroyed: 0 failed: 0
cache external:
current active connections: 0
connections created: 0 failed: 0
connections updated: 0 failed: 0
connections destroyed: 0 failed: 0
traffic processed:
0 Bytes 0 Pckts
multicast traffic (active device=eth0):
8 Bytes sent 0 Bytes recv
1 Pckts sent 0 Pckts recv
0 Error send 0 Error recv
message tracking:
0 Malformed msgs 0 Lost msgs
Conntrackd switch to backup done
Switch conntrackd mode backup 0
Status: BACKUP
To master called
ifdown: interface eth2 not configured
RTNETLINK answers: File exists
Failed to bring up eth2.
RTNETLINK answers: No such process
Enable public ip returned 2
Fail to enable public ip!
Password server is not running
Stopping DNS forwarder and DHCP server: dnsmasq(not running) ... (warning).
Stopping keepalived: keepalived.
Stopping conntrackd.
Status: FAULT (RTNETLINK answers: No such process)
Attaching the following logs to the bug along with mgmt server db dump.
- mgmt server log
- db dump
- MASTER (before reboot logs)
* ifconfig output
* ifconfig -a output
* /ramdisk/rrouter/keepalived.log
* checkrouter.sh output
- BACKUP (before and after failover)
* ifconfig output
* ifconfig -a output
* /ramdisk/rrouter/keepalived.log
* checkrouter.sh output
> Redundant Virtual Router - no failover occur
> --------------------------------------------
>
> Key: CLOUDSTACK-4199
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4199
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Management Server
> Affects Versions: 4.2.0
> Environment: MS ACS 4.2 campo internal build 341
> host XS 6.2
> Reporter: angeline shen
> Priority: Critical
> Fix For: 4.2.0
>
> Attachments: management-server.log.gz, Screenshot-CloudPlatform™ -
> Mozilla Firefox-3.png, Screenshot-CloudPlatform™ - Mozilla Firefox-4.png
>
>
> 1. create network offering 'egallowrvrnw1' with egress firewall policy :
> allow , redundant router
> advance zone. create network of this offering. create guest VMs
> Verify ssh to VMs. VMs can ping other VMs in this network & reach
> internet
> 2. RVR MASTER r-37-VM
> RVR BACKUP r-38-VM
> stop r-37-VM
> Result: r-37-VM state becomes UNKNOWN
> r-38-VM state becomes FAULT
> no failover occur
> Cannot ssh to existing VMs
> 3. start r-37-VM.
> Result: r-37-VM state becomes MASTER
> r-38-VM state remains FAULT
> VMs can reach other VMs in same network.
> VMs cannot reach internet
> 4. stop r-37-VM
> r-37-VM state becomes UNKNOWN
> r-38-VM state becomes FAULT
> no failover occur
> Cannot ssh to existing VMs
> r.VirtualNetworkApplianceManagerImpl] (RouterStatusMonitor-1:null) Found 1
> networks to update RvR status.
> 2013-08-08 19:22:44,763 INFO
> [network.router.VirtualNetworkApplianceManagerImpl]
> (RedundantRouterStatusMonitor-6:null) Redundant virtual router (name:
> r-37-VM, id: 37) just switch from MASTER to UNKNOWN
> 2013-08-08 19:22:44,768 DEBUG [agent.transport.Request]
> (RedundantRouterStatusMonitor-6:null) Seq 1-2062888873: Sending { Cmd ,
> MgmtId: 7343890761426, via: 1, Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.CheckRouterCommand":{"a
> ccessDetails":{"router.ip":"169.254.3.245","router.name":"r-38-VM"},"wait":30}}]
> }
> 2013-08-08 19:22:44,769 DEBUG [agent.transport.Request]
> (RedundantRouterStatusMonitor-6:null) Seq 1-2062888873: Executing: { Cmd ,
> MgmtId: 7343890761426, via: 1, Ver: v1, Flags: 100011,
> [{"com.cloud.agent.api.CheckRouterCommand":
> 2013-08-08 19:22:45,116 INFO
> [network.router.VirtualNetworkApplianceManagerImpl]
> (RedundantRouterStatusMonitor-6:null) Redundant virtual router (name:
> r-38-VM, id: 38) just switch from BACKUP to FAULT
> 2013-08-08 19:22:45,344 DEBUG [agent.manager.DirectAgentAttache]
> (DirectAgent-270:null) Seq 1-2062888874: Response Received:
> 2013-08-08 19:22:45,345 DEBUG [agent.transport.Request]
> (DirectAgent-270:null) Seq 1-2062888874: Processing: { Ans: , MgmtId:
> 7343890761426, via: 1, Ver: v1, Flags: 10,
> [{"com.cloud.agent.api.CheckRouterAnswer":{"state":"FAULT","
> isBumped":false,"result":true,"details":"Status: FAULT (RTNETLINK answers: No
> such process)&Bumped: NO","wait":0}}] }
> 2013-08-08 19:22:45,345 DEBUG [agent.transport.Request]
> (RedundantRouterStatusMonitor-6:null) Seq 1-2062888874: Received: { Ans: ,
> MgmtId: 7343890761426, via: 1, Ver: v1, Flags: 10, { CheckRouterAnswer } }
> 2013-08-08 19:22:45,345 DEBUG [agent.manager.AgentManagerImpl]
> (RedundantRouterStatusMonitor-6:null) Details from executing class
> com.cloud.agent.api.CheckRouterCommand: Status: FAULT (RTNETLINK answers: No
> such process)&Bumped: N
> O
> 2013-08-08 19:22:45,349 INFO
> [network.router.VirtualNetworkApplianceManagerImpl]
> (RedundantRouterStatusMonitor-6:null) Redundant virtual router (name:
> r-38-VM, id: 38) just switch from BACKUP to FAULT
> 2013-08-08 19:22:46,781 DEBUG [agent.manager.AgentManagerImpl]
> (AgentManager-Handler-13:null) Ping from 2
> 2013-08-08 19:22:47,125 DEBUG [agent.manager.AgentManagerImpl]
> (AgentManager-Handler-12:null) Ping from 3
>
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira