[ https://issues.apache.org/jira/browse/CLOUDSTACK-9255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Milamber updated CLOUDSTACK-9255: --------------------------------- Attachment: anon-rvr-2nd-after-20.log The cloud.log of the RVR (master) before the kill > Unable to start VM DomainRouter due to error in finalizeStart, not retrying > --------------------------------------------------------------------------- > > Key: CLOUDSTACK-9255 > URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9255 > Project: CloudStack > Issue Type: Bug > Security Level: Public(Anyone can view this level - this is the > default.) > Components: Virtual Router > Affects Versions: 4.7.0, 4.6.2, 4.8.0, 4.7.1 > Environment: Ubuntu 14.04.3 > KVM > NFS (primary/secondary) > Reporter: Milamber > Attachments: anon-rvr-2nd-after-20.log > > > I've spent 3 days with the same issue : unable to restart with clean up a > network (virtual router or redondant virtual router) if the network have at > least 20 virtual machines. > I've tested with CS 4.6.2, 4.7.0, 4.7.1RC1, 4.8.0RC1, same problem. I've used > the system vm from apt-get.eu and last builds from jenkins. > My tests are made with hosts/mgr on Ubuntu 14.04.3 / KVM / NFS > primary/secondary. > My test case (with ansible modules) : > 1/ create a new network (normal or RVR) > 2/ create 20 vms (same params, just the name is changes) > wait the end of creation > 3/ restart the network with clean up option > 4/ wait the restart, after some minutes, an error message arrived : "Failed > to restart network" > The trace in management.log are: > 2016-01-23 23:02:51,503 ERROR [c.c.v.VmWorkJobDispatcher] > (Work-Job-Executor-51:ctx-9ed51622 job-268/job-271) (logid:b9a521fa) Unable > to complete AsyncJobVO {id:271, userId: 2, accountId: 2, instanceType: null, > instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: > rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAMnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw, > cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: > null, initMsid: 146456419427, completeMsid: null, lastUpdated: null, > lastPolled: null, created: Sat Jan 23 22:56:00 CET 2016}, job origin:268 > com.cloud.exception.AgentUnavailableException: Resource [Host:1] is > unreachable: Host 1: Unable to start instance due to Unable to start > VM[DomainRouter|r-50-VM] due to error in finalizeStart, not retrying > at > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1119) > at > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4578) > at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107) > at > com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4734) > at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103) > at > org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53) > at > org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46) > at > org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: com.cloud.utils.exception.ExecutionException: Unable to start > VM[DomainRouter|r-50-VM] due to error in finalizeStart, not retrying > at > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1083) > at > com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4578) > at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source) > ... 17 more > During the restart of the network I can connect on the VR with link local > link over ssh, the last lines shows: > 2016-01-23 22:02:39,780 configure.py __init__:128 AclIP created for rule ==> > {'last_port': 65535, u'protocol': u'tcp', u'revoked': False, > u'already_added': True, u'source_cidr_list': [u'0.0.0.0/0'], 'cidr': > [u'0.0.0.0/0'], u'id': 52, u'src_ip': u'192.168.13.30', u'purpose': > u'Firewall', 'allowed': True, 'action': 'ACCEPT', u'src_port_range': [1, > 65535], u'traffic_type': u'Ingress', 'type': u'tcp', > u'default_egress_policy': False, 'first_port': 1} > 2016-01-23 22:02:39,780 configure.py add_rule:165 Current ACL IP direction > is ==> ingress > 2016-01-23 22:02:39,780 merge.py load:60 Loading data bag type > forwardingrules > Broadcast message from root@r-50-VM (Sat Jan 23 22:02:45 2016): > The system is going down for system halt NOW! > Broadcast message from root@r-50-VM (Sat Jan 23 22:02:45 2016): > Power button pressed > The system is going down for system halt NOW! > /opt/cloud/bin/vr_cfg.sh: line 60: 16845 Killed > /opt/cloud/bin/update_config.py vm_metadata.json > Sat Jan 23 22:02:46 UTC 2016 : VR config: executing failed: > /opt/cloud/bin/update_config.py vm_metadata.json > Connection to 169.254.2.186 closed by remote host. > Connection to 169.254.2.186 closed. > Perhaps that was a timeout issue? if I create one VM or 10 VMs, the network > restart works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)