venkata swamybabu budumuru created CLOUDSTACK-4080:
------------------------------------------------------
Summary: Router VM is stopped by scavenger thread as part of
DeployVMCmd if the network.gc is set to low value like "10" seconds
Key: CLOUDSTACK-4080
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4080
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: Network Controller
Affects Versions: 4.2.0
Environment: commit # 7522f811672f66bc0cc13a33f4f3737ef03f22af
Reporter: venkata swamybabu budumuru
Priority: Blocker
Fix For: 4.2.0
Steps to reproduce:
1. Have latest CloudStack setup with at least one advanced zone.
2. Make sure network.gc.interval and wait are set to "10" seconds
3. Have at least one network offering of type "isolated" and with all services
enabled where LB is provided by NS and other services are provided by VR.
mysql> select * from network_offerings where id=15\G
*************************** 1. row ***************************
id: 15
name: NetworkOffering with NS
uuid: 4aaf5c58-6d45-4213-8c26-0b2b6f6792c5
unique_name: NetworkOffering with NS
display_text: NetworkOffering with NS
nw_rate: NULL
mc_rate: 10
traffic_type: Guest
tags: NULL
system_only: 0
specify_vlan: 0
service_offering_id: NULL
conserve_mode: 0
created: 2013-08-05 07:30:38
removed: NULL
default: 0
availability: Optional
dedicated_lb_service: 0
shared_source_nat_service: 0
sort_key: 0
redundant_router_service: 0
state: Enabled
guest_type: Isolated
elastic_ip_service: 0
eip_associate_public_ip: 0
elastic_lb_service: 0
specify_ip_ranges: 0
inline: 0
is_persistent: 0
internal_lb: 0
public_lb: 1
egress_default_policy: 1
concurrent_connections: NULL
mysql> select * from ntwk_offering_service_map where network_offering_id=15;
+----+---------------------+----------------+---------------+---------------------+
| id | network_offering_id | service | provider | created
|
+----+---------------------+----------------+---------------+---------------------+
| 58 | 15 | Dhcp | VirtualRouter | 2013-08-05
07:30:38 |
| 55 | 15 | Dns | VirtualRouter | 2013-08-05
07:30:38 |
| 60 | 15 | Firewall | VirtualRouter | 2013-08-05
07:30:38 |
| 59 | 15 | Lb | Netscaler | 2013-08-05
07:30:38 |
| 54 | 15 | PortForwarding | VirtualRouter | 2013-08-05
07:30:38 |
| 56 | 15 | SourceNat | VirtualRouter | 2013-08-05
07:30:38 |
| 53 | 15 | StaticNat | VirtualRouter | 2013-08-05
07:30:38 |
| 57 | 15 | UserData | VirtualRouter | 2013-08-05
07:30:38 |
| 61 | 15 | Vpn | VirtualRouter | 2013-08-05
07:30:38 |
+----+---------------------+----------------+---------------+---------------------+
mysql> select * from host_details where host_id=4;
+----+---------+-------------------+-------------------------------------------+
| id | host_id | name | value |
+----+---------+-------------------+-------------------------------------------+
| 13 | 4 | deviceName | NetscalerVPXLoadBalancer |
| 11 | 4 | guid | 1cf71bde-3994-42eb-80e0-046278a1763d |
| 21 | 4 | ip | 10.147.60.26 |
| 19 | 4 | lbdevicededicated | false |
| 23 | 4 | lbdeviceid | 1 |
| 16 | 4 | name | 201-NetscalerVPXLoadBalancer-10.147.60.26 |
| 17 | 4 | numretries | 2 |
| 18 | 4 | password | ck3EWqTylg79ZMj4gG2sHA== |
| 20 | 4 | physicalNetworkId | 201 |
| 15 | 4 | privateinterface | 1/2 |
| 14 | 4 | publicinterface | 1/3 |
| 12 | 4 | username | nsroot |
| 22 | 4 | zoneId | 2 |
+----+---------+-------------------+-------------------------------------------+
4. As a non-ROOT domain user, Try to deploy a VM using the network that is
created using the above n/w offering.
mysql> select * from networks where id=242\G
*************************** 1. row ***************************
id: 242
name: test
uuid: c8028134-77ab-415a-bdb9-d9378754479b
display_text: test
traffic_type: Guest
broadcast_domain_type: Vlan
broadcast_uri: NULL
gateway: 10.0.48.1
cidr: 10.0.48.0/20
mode: Dhcp
network_offering_id: 15
physical_network_id: 200
data_center_id: 1
guru_name: ExternalGuestNetworkGuru
state: Allocated
related: 242
domain_id: 2
account_id: 4
dns1: 10.103.128.16
dns2: NULL
guru_data: NULL
set_fields: 0
acl_type: Account
network_domain: cs4cloud.internal
reservation_id: 803e1334-ed30-4980-a6f2-299427724bb9
guest_type: Isolated
restart_required: 0
created: 2013-08-05 11:27:19
removed: NULL
specify_ip_ranges: 0
vpc_id: NULL
ip6_gateway: NULL
ip6_cidr: NULL
network_cidr: NULL
display_network: 1
network_acl_id: NULL
Observations:
(i) deployVMCmd goes fine without any issues but, network scavenger is going
and shutting down the network immediately after startAnswer for userVM
(ii) Here is the deployVMCmd :
2013-08-05 16:57:19,427 DEBUG [cloud.api.ApiServlet] (catalina-exec-22:null)
===END=== 10.252.192.25 -- GET
command=deployVirtualMachine&zoneId=7b6b3c07-7e33-483f-b2a1-2f89a0d9ff96&templateId=4643adee-fd8e-11e2-9c07-069f2c0000aa&hypervisor=KVM&serviceOfferingId=d42e0af6-370b-4a4f-a318-98d1d2a9a8e3&networkIds=c8028134-77ab-415a-bdb9-d9378754479b&displayname=test&name=test&response=json&sessionkey=YGtsbnrjR7V3vmhttXR20I2v8L0%3D&_=1375702049488
(iii) The above command initiated a router VM deployment
2013-08-05 16:57:25,237 DEBUG [agent.transport.Request] (Job-Executor-8:job-59
= [ 17db0422-7b06-4bb6-b78a-ae9728bd26d1 ]) Seq 1-1321009484: Sending { Cmd ,
MgmtId: 7280707764394, via: 1, Ver: v1, Flags: 100011,
[{"com.cloud.agent.api.StartCommand":{"vm":{"id":14,"name":"r-14-VM","type":"DomainRouter","cpus":1,"minSpeed":500,"maxSpeed":500,"minRam":134217728,"maxRam":134217728,"arch":"x86_64","os":"Debian
GNU/Linux 5.0 (32-bit)","bootArgs":" template=domP name=r-14-VM
eth2ip=10.147.44.67 eth2mask=255.255.255.0 gateway=10.147.44.1 eth0ip=10.0.48.1
eth0mask=255.255.240.0 domain=cs4cloud.internal dhcprange=10.0.48.1
eth1ip=169.254.3.57 eth1mask=255.255.0.0 type=router disable_rp_filter=true
dns1=10.103.128.16","rebootOnCrash":false,"enableHA":true,"limitCpuUse":false,"enableDynamicallyScaleVm":false,"vncPassword":"821327d55071659e","params":{},"uuid":"3980de9d-1b91-4e7c-ae36-2b9a9d08ef38","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"a11a7197-2aed-40e6-9569-6787c577ab2c","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5458182e-bfcb-351c-97ed-e7223bca2b8e","id":1,"poolType":"NetworkFilesystem","host":"10.147.28.7","path":"/export/home/swamy/primary.campo.kvm.1.zone","port":2049}},"name":"ROOT-14","size":276162048,"path":"458958b5-5497-477e-9e53-1227a0187688","volumeId":17,"vmName":"r-14-VM","accountId":4,"format":"QCOW2","id":17,"hypervisorType":"None"}},"diskSeq":0,"type":"ROOT"}],"nics":[{"deviceId":2,"networkRateMbps":200,"defaultNic":true,"uuid":"65fd9e0c-5b15-4181-9ab1-4a13fea4739e","ip":"10.147.44.67","netmask":"255.255.255.0","gateway":"10.147.44.1","mac":"06:2a:3a:00:00:12","dns1":"10.103.128.16","broadcastType":"Vlan","type":"Public","broadcastUri":"vlan://44","isolationUri":"vlan://44","isSecurityGroupEnabled":false},{"deviceId":0,"networkRateMbps":200,"defaultNic":false,"uuid":"d2a0ff89-a6ce-4005-a875-d36787318342","ip":"10.0.48.1","netmask":"255.255.240.0","gateway":"10.0.48.1","mac":"02:00:6b:6d:00:02","dns1":"10.103.128.16","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://909","isolationUri":"vlan://909","isSecurityGroupEnabled":false},{"deviceId":1,"networkRateMbps":-1,"defaultNic":false,"uuid":"0a187643-dc28-48b5-bf99-8acbe5f753c2","ip":"169.254.3.57","netmask":"255.255.0.0","gateway":"169.254.0.1","mac":"0e:00:a9:fe:03:39","broadcastType":"LinkLocal","type":"Control","isSecurityGroupEnabled":false}]},"hostIp":"10.147.40.11","executeInSequence":false,"wait":0}},{"com.cloud.agent.api.check.CheckSshCommand":{"ip":"169.254.3.57","port":3922,"interval":6,"retries":100,"name":"r-14-VM","wait":0}},{"com.cloud.agent.api.GetDomRVersionCmd":{"accessDetails":{"router.ip":"169.254.3.57","router.name":"r-14-VM"},"wait":0}},{},{"com.cloud.agent.api.routing.IpAssocCommand":{"ipAddresses":[{"accountId":4,"publicIp":"10.147.44.67","sourceNat":true,"add":true,"oneToOneNat":false,"firstIP":true,"vlanId":"44","vlanGateway":"10.147.44.1","vlanNetmask":"255.255.255.0","vifMacAddress":"06:b0:5a:00:00:12","networkRate":200,"trafficType":"Public"}],"accessDetails":{"router.guest.ip":"10.0.48.1","zone.network.type":"Advanced","router.ip":"169.254.3.57","router.name":"r-14-VM"},"wait":0}}]
}
(iv) Router VM started successfully
2013-08-05 16:58:40,412 DEBUG [agent.transport.Request]
(AgentManager-Handler-5:null) Seq 1-1321009484: Processing: { Ans: , MgmtId:
7280707764394, via: 1, Ver: v1, Flags: 10,
[{"com.cloud.agent.api.StartAnswer":{"vm":{"id":14,"name":"r-14-VM","type":"DomainRouter","cpus":1,"minSpeed":500,"maxSpeed":500,"minRam":134217728,"maxRam":134217728,"arch":"x86_64","os":"Debian
GNU/Linux 5.0 (32-bit)","bootArgs":" template=domP name=r-14-VM
eth2ip=10.147.44.67 eth2mask=255.255.255.0 gateway=10.147.44.1 eth0ip=10.0.48.1
eth0mask=255.255.240.0 domain=cs4cloud.internal dhcprange=10.0.48.1
eth1ip=169.254.3.57 eth1mask=255.255.0.0 type=router disable_rp_filter=true
dns1=10.103.128.16","rebootOnCrash":false,"enableHA":true,"limitCpuUse":false,"enableDynamicallyScaleVm":false,"vncPassword":"821327d55071659e","vncAddr":"10.147.40.11","params":{},"uuid":"3980de9d-1b91-4e7c-ae36-2b9a9d08ef38","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"a11a7197-2aed-40e6-9569-6787c577ab2c","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5458182e-bfcb-351c-97ed-e7223bca2b8e","id":1,"poolType":"NetworkFilesystem","host":"10.147.28.7","path":"/export/home/swamy/primary.campo.kvm.1.zone","port":2049}},"name":"ROOT-14","size":276162048,"path":"458958b5-5497-477e-9e53-1227a0187688","volumeId":17,"vmName":"r-14-VM","accountId":4,"format":"QCOW2","id":17,"hypervisorType":"None"}},"diskSeq":0,"type":"ROOT"}],"nics":[{"deviceId":2,"networkRateMbps":200,"defaultNic":true,"uuid":"65fd9e0c-5b15-4181-9ab1-4a13fea4739e","ip":"10.147.44.67","netmask":"255.255.255.0","gateway":"10.147.44.1","mac":"06:2a:3a:00:00:12","dns1":"10.103.128.16","broadcastType":"Vlan","type":"Public","broadcastUri":"vlan://44","isolationUri":"vlan://44","isSecurityGroupEnabled":false},{"deviceId":0,"networkRateMbps":200,"defaultNic":false,"uuid":"d2a0ff89-a6ce-4005-a875-d36787318342","ip":"10.0.48.1","netmask":"255.255.240.0","gateway":"10.0.48.1","mac":"02:00:6b:6d:00:02","dns1":"10.103.128.16","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://909","isolationUri":"vlan://909","isSecurityGroupEnabled":false},{"deviceId":1,"networkRateMbps":-1,"defaultNic":false,"uuid":"0a187643-dc28-48b5-bf99-8acbe5f753c2","ip":"169.254.3.57","netmask":"255.255.0.0","gateway":"169.254.0.1","mac":"0e:00:a9:fe:03:39","broadcastType":"LinkLocal","type":"Control","isSecurityGroupEnabled":false}]},"result":true,"wait":0}},{"com.cloud.agent.api.check.CheckSshAnswer":{"result":true,"wait":0}},{"com.cloud.agent.api.GetDomRVersionAnswer":{"templateVersion":"Cloudstack
Release 4.2.0 Thu Jun 13 04:15:09 UTC
2013","scriptsVersion":"0026a7d7d957616f59bdeab0c49258bb","result":true,"details":"Cloudstack
Release 4.2.0 Thu Jun 13 04:15:09 UTC
2013&0026a7d7d957616f59bdeab0c49258bb","wait":0}},{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-14-VM","bytesSent":0,"bytesReceived":0,"result":true,"wait":0}},{"com.cloud.agent.api.routing.IpAssocAnswer":{"results":["10.147.44.67
- success"],"result":true,"wait":0}}] }
(v) After the router VM is up, it triggered startComand for userVM and that as
well went fine.
2013-08-05 16:58:43,001 DEBUG [agent.transport.Request]
(AgentManager-Handler-14:null) Seq 1-1321009496: Processing: { Ans: , MgmtId:
7280707764394, via: 1, Ver: v1, Flags: 10,
[{"com.cloud.agent.api.StartAnswer":{"vm":{"id":13,"name":"i-4-13-VM","type":"User","cpus":1,"minSpeed":500,"maxSpeed":500,"minRam":536870912,"maxRam":536870912,"arch":"x86_64","os":"CentOS
5.5
(64-bit)","bootArgs":"","rebootOnCrash":false,"enableHA":false,"limitCpuUse":false,"enableDynamicallyScaleVm":false,"vncPassword":"b1f3f5cddd630be3","vncAddr":"10.147.40.11","params":{},"uuid":"773ccd08-cfef-41ad-8c43-30a169630d0b","disks":[{"data":{"org.apache.cloudstack.storage.to.VolumeObjectTO":{"uuid":"ffa1c5ad-cdca-40dc-93bd-03c1dbdd4caf","volumeType":"ROOT","dataStore":{"org.apache.cloudstack.storage.to.PrimaryDataStoreTO":{"uuid":"5458182e-bfcb-351c-97ed-e7223bca2b8e","id":1,"poolType":"NetworkFilesystem","host":"10.147.28.7","path":"/export/home/swamy/primary.campo.kvm.1.zone","port":2049}},"name":"ROOT-13","size":8589934592,"path":"ad4b1306-a145-4944-b296-775671c9624b","volumeId":16,"vmName":"i-4-13-VM","accountId":4,"format":"QCOW2","id":16,"hypervisorType":"None"}},"diskSeq":0,"type":"ROOT"},{"data":{"org.apache.cloudstack.storage.to.TemplateObjectTO":{"id":0,"format":"ISO","accountId":0,"hvm":false}},"diskSeq":3,"type":"ISO"}],"nics":[{"deviceId":0,"networkRateMbps":200,"defaultNic":true,"uuid":"302aa348-f8aa-4872-ad59-142500ed9f63","ip":"10.0.49.81","netmask":"255.255.240.0","gateway":"10.0.48.1","mac":"02:00:1a:0c:00:01","dns1":"10.103.128.16","broadcastType":"Vlan","type":"Guest","broadcastUri":"vlan://909","isolationUri":"vlan://909","isSecurityGroupEnabled":false}]},"result":true,"wait":0}}]
}
(vi) One additional observation here : After the router VM started, I could see
the nics.state as "Allocated" for userVM for sometime.
*************************** 33. row ***************************
id: 33
uuid: 302aa348-f8aa-4872-ad59-142500ed9f63
instance_id: 13
mac_address: 02:00:1a:0c:00:01
ip4_address: 10.0.49.81
netmask: NULL
gateway: NULL
ip_type: Ip4
broadcast_uri: NULL
network_id: 242
mode: Dhcp
state: Allocated
strategy: Start
reserver_name: ExternalGuestNetworkGuru
reservation_id: NULL
device_id: 0
update_time: 2013-08-05 16:57:19
isolation_uri: NULL
ip6_address: NULL
default_nic: 1
vm_type: User
created: 2013-08-05 11:27:19
removed: NULL
ip6_gateway: NULL
ip6_cidr: NULL
secondary_ip: 0
display_nic: 1
(vii) Once the UserVM is up then I see the nics.state for userVM nic as
"Reserved"
mysql> select * from nics where instance_id=13\G
*************************** 1. row ***************************
id: 33
uuid: 302aa348-f8aa-4872-ad59-142500ed9f63
instance_id: 13
mac_address: 02:00:1a:0c:00:01
ip4_address: 10.0.49.81
netmask: 255.255.240.0
gateway: 10.0.48.1
ip_type: Ip4
broadcast_uri: vlan://909
network_id: 242
mode: Dhcp
state: Reserved
strategy: Start
reserver_name: ExternalGuestNetworkGuru
reservation_id: 803e1334-ed30-4980-a6f2-299427724bb9
device_id: 0
update_time: 2013-08-05 16:58:40
isolation_uri: vlan://909
ip6_address: NULL
default_nic: 1
vm_type: User
created: 2013-08-05 11:27:19
removed: NULL
ip6_gateway: NULL
ip6_cidr: NULL
secondary_ip: 0
display_nic: 1
1 row in set (0.00 sec)
(viii) As soon the startAnswer Comes for the userVM, I see that there is
network shutdown initiated by network scavenger thread.
Here is the snippet from mgmt server logs.
013-08-05 16:58:45,756 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-2:null) SeqA 2-790: Processing Seq 2-790: { Cmd ,
MgmtId: -1, via: 2, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":2,"_loadInfo":"{\n
\"connections\": []\n}","wait":0}}] }
2013-08-05 16:58:45,762 DEBUG [agent.manager.AgentManagerImpl]
(AgentManager-Handler-2:null) SeqA 2-790: Sending Seq 2-790: { Ans: , MgmtId:
7280707764394, via: 2, Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
2013-08-05 16:58:46,685 DEBUG [network.resource.NetscalerResource]
(DirectAgent-169:null) Netscaler load balancer 10.147.60.26 successfully
executed IPAssocCommand to remove IP com.cloud.agent.api.to.IpAddressTO@1e6c629
2013-08-05 16:58:46,685 DEBUG [agent.manager.DirectAgentAttache]
(DirectAgent-169:null) Seq 8-1242890258: Response Received:
2013-08-05 16:58:46,685 DEBUG [agent.transport.Request] (DirectAgent-169:null)
Seq 8-1242890258: Processing: { Ans: , MgmtId: 7280707764394, via: 8, Ver: v1,
Flags: 10, [{"com.cloud.agent.api.routing.IpAssocAnswer":{"results":["null -
success"],"result":true,"wait":0}}] }
2013-08-05 16:58:46,686 DEBUG [agent.transport.Request]
(Network-Scavenger-1:null) Seq 8-1242890258: Received: { Ans: , MgmtId:
7280707764394, via: 8, Ver: v1, Flags: 10, { IpAssocAnswer } }
2013-08-05 16:58:46,800 DEBUG
[cloud.network.ExternalLoadBalancerDeviceManagerImpl]
(Network-Scavenger-1:null) External load balancer has shut down the guest
network for account dom1Acc2(id = 4) with VLAN tag 909
2013-08-05 16:58:46,804 DEBUG [cloud.network.NetworkManagerImpl]
(Network-Scavenger-1:null) Sending network shutdown to VirtualRouter
2013-08-05 16:58:46,808 DEBUG
[network.router.VirtualNetworkApplianceManagerImpl] (Network-Scavenger-1:null)
Stopping router VM[DomainRouter|r-14-VM]
2013-08-05 16:58:46,817 DEBUG [cloud.capacity.CapacityManagerImpl]
(Network-Scavenger-1:null) VM state transitted from :Running to Stopping with
event: StopRequestedvm's original host id: 1 new host id: 1 host id before
state transition: 1
2013-08-05 16:58:46,945 DEBUG [agent.transport.Request]
(AgentManager-Handler-1:null) Seq 1-1321009498: Processing: { Ans: , MgmtId:
7280707764394, via: 1, Ver: v1, Flags: 10,
[{"com.cloud.agent.api.NetworkUsageAnswer":{"routerName":"r-14-VM","bytesSent":0,"bytesReceived":0,"result":true,"details":"","wait":0}}]
}
Attaching all the required logs along with db dump to the bug.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira