Dear all, I am using CloudStack version 4.2.0 with KVM hypervisor. I notice a strange behaviour when an agent got disconnected from the management server and I restarted the cloudstack-agent service to reconnect, it takes very long time to reconnect. And most of the time, it will stop most -- if not all -- of the running VMs on the host before finally manage to re-connect.
I checked the management server logs, and it seems these are the entries which caused the VMs to be stopped: ==== 2015-07-02 02:12:13,556 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Executing: /usr/share/cloudstack-common/scripts/vm/network/security_group.py destroy_network_rules_for_vm --vmname i-648-2613-VM --vif vnet9 2015-07-02 02:12:13,711 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Execution is successful. 2015-07-02 02:12:13,712 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Try to stop the vm at first 2015-07-02 02:12:15,716 DEBUG [utils.script.Script] (agentRequest-Handler-3:null) Executing: /bin/bash -c ls /sys/class/net/breth1-8/brif | grep vnet 2015-07-02 02:12:15,741 DEBUG [utils.script.Script] (agentRequest-Handler-3:null) Execution is successful. 2015-07-02 02:12:15,742 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) Processing command: com.cloud.agent.api.StopCommand 2015-07-02 02:12:16,253 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Executing: /usr/share/cloudstack-common/scripts/vm/network/security_group.py destroy_network_rules_for_vm --vmname i-2-1779-VM 2015-07-02 02:12:16,423 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Execution is successful. 2015-07-02 02:12:16,424 DEBUG [kvm.resource.LibvirtComputingResource] (agentRequest-Handler-3:null) Try to stop the vm at first 2015-07-02 02:12:16,426 DEBUG [utils.script.Script] (agentRequest-Handler-3:null) Executing: /bin/bash -c ls /sys/class/net/breth1-8/brif | grep vnet 2015-07-02 02:12:16,456 DEBUG [utils.script.Script] (agentRequest-Handler-3:null) Execution is successful. 2015-07-02 02:12:16,457 DEBUG [cloud.agent.Agent] (agentRequest-Handler-3:null) Processing command: com.cloud.agent.api.StopCommand ==== Any reason why the network rules need to be destroyed? How can I prevent VMs to be stopped upon agent re-connecting to the management server? Anyone seeing similar behaviour? Looking forward to your reply, thank you. Cheers.