Milamber created CLOUDSTACK-9255:
------------------------------------

             Summary: Unable to start VM DomainRouter due to error in 
finalizeStart, not retrying
                 Key: CLOUDSTACK-9255
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-9255
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Virtual Router
    Affects Versions: 4.7.0, 4.6.2, 4.8.0, 4.7.1
         Environment: Ubuntu 14.04.3
KVM
NFS (primary/secondary)
            Reporter: Milamber



I've spent 3 days with the same issue : unable to restart with clean up a 
network (virtual router or redondant virtual router) if the network have at 
least 20 virtual machines.

I've tested with CS 4.6.2, 4.7.0, 4.7.1RC1, 4.8.0RC1, same problem. I've used 
the system vm from apt-get.eu and last builds from jenkins.

My tests are made with hosts/mgr on Ubuntu 14.04.3 / KVM / NFS 
primary/secondary.

My test case (with ansible modules) :
1/ create a new network (normal or RVR)
2/ create 20 vms (same params, just the name is changes)
wait the end of creation
3/ restart the network with clean up option
4/ wait the restart, after some minutes, an error message arrived : "Failed to 
restart network"

The trace in management.log are:

2016-01-23 23:02:51,503 ERROR [c.c.v.VmWorkJobDispatcher] 
(Work-Job-Executor-51:ctx-9ed51622 job-268/job-271) (logid:b9a521fa) Unable to 
complete AsyncJobVO {id:271, userId: 2, accountId: 2, instanceType: null, 
instanceId: null, cmd: com.cloud.vm.VmWorkStart, cmdInfo: 
rO0ABXNyABhjb20uY2xvdWQudm0uVm1Xb3JrU3RhcnR9cMGsvxz73gIAC0oABGRjSWRMAAZhdm9pZHN0ADBMY29tL2Nsb3VkL2RlcGxveS9EZXBsb3ltZW50UGxhbm5lciRFeGNsdWRlTGlzdDtMAAljbHVzdGVySWR0ABBMamF2YS9sYW5nL0xvbmc7TAAGaG9zdElkcQB-AAJMAAtqb3VybmFsTmFtZXQAEkxqYXZhL2xhbmcvU3RyaW5nO0wAEXBoeXNpY2FsTmV0d29ya0lkcQB-AAJMAAdwbGFubmVycQB-AANMAAVwb2RJZHEAfgACTAAGcG9vbElkcQB-AAJMAAlyYXdQYXJhbXN0AA9MamF2YS91dGlsL01hcDtMAA1yZXNlcnZhdGlvbklkcQB-AAN4cgATY29tLmNsb3VkLnZtLlZtV29ya5-ZtlbwJWdrAgAESgAJYWNjb3VudElkSgAGdXNlcklkSgAEdm1JZEwAC2hhbmRsZXJOYW1lcQB-AAN4cAAAAAAAAAACAAAAAAAAAAIAAAAAAAAAMnQAGVZpcnR1YWxNYWNoaW5lTWFuYWdlckltcGwAAAAAAAAAAHBwcHBwcHBwc3IAEWphdmEudXRpbC5IYXNoTWFwBQfawcMWYNEDAAJGAApsb2FkRmFjdG9ySQAJdGhyZXNob2xkeHA_QAAAAAAADHcIAAAAEAAAAAF0AA5SZXN0YXJ0TmV0d29ya3QAP3JPMEFCWE55QUJGcVlYWmhMbXhoYm1jdVFtOXZiR1ZoYnMwZ2NvRFZuUHJ1QWdBQldnQUZkbUZzZFdWNGNBRXhw,
 cmdVersion: 0, status: IN_PROGRESS, processStatus: 0, resultCode: 0, result: 
null, initMsid: 146456419427, completeMsid: null, lastUpdated: null, 
lastPolled: null, created: Sat Jan 23 22:56:00 CET 2016}, job origin:268
com.cloud.exception.AgentUnavailableException: Resource [Host:1] is 
unreachable: Host 1: Unable to start instance due to Unable to start 
VM[DomainRouter|r-50-VM] due to error in finalizeStart, not retrying
    at 
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1119)
    at 
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4578)
    at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at 
com.cloud.vm.VmWorkJobHandlerProxy.handleVmWorkJob(VmWorkJobHandlerProxy.java:107)
    at 
com.cloud.vm.VirtualMachineManagerImpl.handleVmWorkJob(VirtualMachineManagerImpl.java:4734)
    at com.cloud.vm.VmWorkJobDispatcher.runJob(VmWorkJobDispatcher.java:102)
    at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.runInContext(AsyncJobManagerImpl.java:554)
    at 
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
    at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
    at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
    at 
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
    at 
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
    at 
org.apache.cloudstack.framework.jobs.impl.AsyncJobManagerImpl$5.run(AsyncJobManagerImpl.java:502)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: com.cloud.utils.exception.ExecutionException: Unable to start 
VM[DomainRouter|r-50-VM] due to error in finalizeStart, not retrying
    at 
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1083)
    at 
com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:4578)
    at sun.reflect.GeneratedMethodAccessor374.invoke(Unknown Source)
    ... 17 more


During the restart of the network I can connect on the VR with link local link 
over ssh, the last lines shows:

2016-01-23 22:02:39,780  configure.py __init__:128 AclIP created for rule ==> 
{'last_port': 65535, u'protocol': u'tcp', u'revoked': False, u'already_added': 
True, u'source_cidr_list': [u'0.0.0.0/0'], 'cidr': [u'0.0.0.0/0'], u'id': 52, 
u'src_ip': u'192.168.13.30', u'purpose': u'Firewall', 'allowed': True, 
'action': 'ACCEPT', u'src_port_range': [1, 65535], u'traffic_type': u'Ingress', 
'type': u'tcp', u'default_egress_policy': False, 'first_port': 1}
2016-01-23 22:02:39,780  configure.py add_rule:165 Current ACL IP direction is 
==> ingress
2016-01-23 22:02:39,780  merge.py load:60 Loading data bag type forwardingrules

Broadcast message from root@r-50-VM (Sat Jan 23 22:02:45 2016):

The system is going down for system halt NOW!

Broadcast message from root@r-50-VM (Sat Jan 23 22:02:45 2016):

Power button pressed
The system is going down for system halt NOW!
/opt/cloud/bin/vr_cfg.sh: line 60: 16845 Killed                  
/opt/cloud/bin/update_config.py vm_metadata.json
Sat Jan 23 22:02:46 UTC 2016 : VR config: executing failed: 
/opt/cloud/bin/update_config.py vm_metadata.json
Connection to 169.254.2.186 closed by remote host.
Connection to 169.254.2.186 closed.



Perhaps that was a timeout issue? if I create one VM or 10 VMs, the network 
restart works.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to