Jeffrey Nguyen created STRATOS-1293:
---------------------------------------
Summary: Stratos should remove instance in ERROR state before
trying to re-launch instance
Key: STRATOS-1293
URL: https://issues.apache.org/jira/browse/STRATOS-1293
Project: Stratos
Issue Type: Bug
Components: Cloud Controller
Affects Versions: 4.0.0
Environment: Openstack Icehouse, Stratos 4.0.0 GA
Reporter: Jeffrey Nguyen
On my setup with Icehouse and Stratos 4.0.0GA, I observed there was one
particular cartridge with one running instance and multiple instances in ERROR
state. Upon checking wso2carbon.log, I found several instances of the
exception below. Looked like when Stratos launched the cartridge, the
instance didn't achieve running state, so Stratos tried to launch another
instance. This kept on going until eventually one instance of the cartridge
achieved running status.
We need to make sure when this condition occurs, Stratos will remove the
instances that are in ERROR state before attempting to re-launch. The
instances in ERROR state can exhaust resources on the underline Iaas cluster
(Openstack in this case)
1) IllegalStateException on node
RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:java.lang.IllegalStateException:
node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status
running; aborting after 1 seconds with final status: ERROR
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
at
com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
1 error[s] at
org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:174)
at
org.apache.stratos.autoscaler.rule.RuleTasksDelegator.delegateSpawn(RuleTasksDelegator.java:87)
... 22 moreCaused by: org.apache.axis2.AxisFault: Failed to start an
instance. MemberContext
[memberId=lb01.lb01.domaincde68af9-d82f-42c3-9c01-4fd8da565867, nodeId=null,
clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null,
publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177492261,
lbClusterId=null, networkPartitionId=N1] Cause: error running 1 node
group(lb01lb01) location(RegionOne) image(0299be4d-a743-4424-ae28-f40bd4faa669)
size(2e2e2b47-9f40-4bd8-9777-83e802f5f1cd) options({inboundPorts=[],
autoAssignFloatingIp=false, securityGroupNames=[default], keyPairName=phoenix,
userData=[B@14edd531, configDrive=false,
novaNetworks=[Network{networkUuid=42c4a88d-0d59-4fbb-90f0-9b9806f9c17c,
portUuid=null, fixedIp=172.16.2.201},
Network{networkUuid=6a2615e4-760c-4c93-895d-b4b16e550193, portUuid=null,
fixedIp=10.81.69.201},
Network{networkUuid=670550f0-67fc-48ff-a33c-e184a7908247, portUuid=null,
fixedIp=10.13.5.81}]})
Execution failures:
0 error[s]Node failures:
1) IllegalStateException on node
RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:java.lang.IllegalStateException:
node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status
running; aborting after 1 seconds with final status: ERROR
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
at
com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
1 error[s]
at
org.apache.axis2.util.Utils.getInboundFaultFromMessageContext(Utils.java:531)
at
org.apache.axis2.description.OutInAxisOperationClient.handleResponse(OutInAxisOperation.java:370)
at
org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:445)
at
org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:225)
at
org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
at
org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.startInstance(CloudControllerServiceStub.java:1407)
at
org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:162)
... 23 moreTID: [0] [STRATOS] [2015-03-24 06:13:04,511] INFO
{org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient} -
Trying to spawn an instance via cloud controller: [cluster] lb01.lb01.domain
[partition] RegionOne-Core [lb-cluster] null [network-partition-id] N1
{org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient}TID:
[0] [STRATOS] [2015-03-24 06:13:09,114] INFO
{org.wso2.carbon.databridge.core.DataBridge} - admin connected
{org.wso2.carbon.databridge.core.DataBridge}TID: [0] [STRATOS] [2015-03-24
06:13:16,902] INFO
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -
Instance is successfully starting up. MemberContext
[memberId=lb01.lb01.domain64df2bd7-ee48-4ca2-9dec-362b72543d86,
nodeId=RegionOne/aa1a0e56-a722-444a-99f5-080ef844fb2d,
clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null,
publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177584511,
lbClusterId=null, networkPartitionId=N1]
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
at
org.drools.common.AbstractWorkingMemory.fireAllRules(AbstractWorkingMemory.java:674)
at
org.drools.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:230)
at
org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.evaluateMinCheck(AutoscalerRuleEvaluator.java:94)
at
org.apache.stratos.autoscaler.monitor.ClusterMonitor.monitor(ClusterMonitor.java:157)
at
org.apache.stratos.autoscaler.monitor.ClusterMonitor.run(ClusterMonitor.java:86)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.axis2.AxisFault: Failed to start an instance.
MemberContext [memberId=lb01.lb01.domaincde68af9-d82f-42c3-9c01-4fd8da565867,
nodeId=null, clusterId=lb01.lb01.domain, cartridgeType=lb01,
privateIpAddresses=null, publicIpAddresses=null, allocatedIpAddress=null,
initTime=1427177492261, lbClusterId=null, networkPartitionId=N1] Cause: error
running 1 node group(lb01lb01) location(RegionOne)
image(0299be4d-a743-4424-ae28-f40bd4faa669)
size(2e2e2b47-9f40-4bd8-9777-83e802f5f1cd) options({inboundPorts=[],
autoAssignFloatingIp=false, securityGroupNames=[default], keyPairName=phoenix,
userData=[B@14edd531, configDrive=false,
novaNetworks=[Network{networkUuid=42c4a88d-0d59-4fbb-90f0-9b9806f9c17c,
portUuid=null, fixedIp=172.16.2.201},
Network{networkUuid=6a2615e4-760c-4c93-895d-b4b16e550193, portUuid=null,
fixedIp=10.81.69.201},
Network{networkUuid=670550f0-67fc-48ff-a33c-e184a7908247, portUuid=null,
fixedIp=10.13.5.81}]})
Execution failures:
0 error[s]
Node failures:
1) IllegalStateException on node RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:
java.lang.IllegalStateException:
node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status
running; aborting after 1 seconds with final status: ERROR
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
at
com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
1 error[s]
at
org.apache.axis2.util.Utils.getInboundFaultFromMessageContext(Utils.java:531)
at
org.apache.axis2.description.OutInAxisOperationClient.handleResponse(OutInAxisOperation.java:370)
at
org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:445)
at
org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:225)
at
org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
at
org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.startInstance(CloudControllerServiceStub.java:1407)
at
org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:162)
... 23 moreTID: [0] [STRATOS] [2015-03-24 06:11:34,509] ERROR
{org.apache.stratos.autoscaler.monitor.ClusterMonitor} - Cluster monitor:
Monitor failed.ClusterMonitor [clusterId=lb01.lb01.domain, serviceId=lb01,
deploymentPolicy=Deployment Policy [id]static-1-Core [partitions]
[org.apache.stratos.cloud.controller.stub.deployment.partition.Partition@fb6144],
autoscalePolicy=ASPolicy [id=economyPolicy, displayName=null,
description=null], lbReferenceType=null, hasPrimary=false ]
{org.apache.stratos.autoscaler.monitor.ClusterMonitor}Exception executing
consequence for rule "Minimum Rule" in org.apache.stratos.autoscaler.rule:
java.lang.RuntimeException: cannot invoke method: delegateSpawn at
org.drools.runtime.rule.impl.DefaultConsequenceExceptionHandler.handleException(DefaultConsequenceExceptionHandler.java:39)
at
org.drools.common.DefaultAgenda.fireActivation(DefaultAgenda.java:1297)
at org.drools.common.DefaultAgenda.fireNextItem(DefaultAgenda.java:1221)
at org.drools.common.DefaultAgenda.fireAllRules(DefaultAgenda.java:1456)
at
org.drools.common.AbstractWorkingMemory.fireAllRules(AbstractWorkingMemory.java:710)
at
org.drools.common.AbstractWorkingMemory.fireAllRules(AbstractWorkingMemory.java:674)
at
org.drools.impl.StatefulKnowledgeSessionImpl.fireAllRules(StatefulKnowledgeSessionImpl.java:230)
at
org.apache.stratos.autoscaler.rule.AutoscalerRuleEvaluator.evaluateMinCheck(AutoscalerRuleEvaluator.java:94)
at
org.apache.stratos.autoscaler.monitor.ClusterMonitor.monitor(ClusterMonitor.java:157)
at
org.apache.stratos.autoscaler.monitor.ClusterMonitor.run(ClusterMonitor.java:86)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: cannot invoke method: delegateSpawn
at
org.mvel2.optimizers.impl.refl.nodes.MethodAccessor.getValue(MethodAccessor.java:63)
at
org.mvel2.optimizers.impl.refl.nodes.VariableAccessor.getValue(VariableAccessor.java:37)
at org.mvel2.ast.ASTNode.getReducedValueAccelerated(ASTNode.java:108)
at org.mvel2.MVELRuntime.execute(MVELRuntime.java:85)
at
org.mvel2.compiler.CompiledExpression.getDirectValue(CompiledExpression.java:123)
at
org.mvel2.compiler.CompiledExpression.getValue(CompiledExpression.java:119)
at org.mvel2.MVEL.executeExpression(MVEL.java:930) at
org.drools.base.mvel.MVELConsequence.evaluate(MVELConsequence.java:104)
at org.drools.common.DefaultAgenda.fireActivation(DefaultAgenda.java:1287)
... 9 moreCaused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.mvel2.optimizers.impl.refl.nodes.MethodAccessor.getValue(MethodAccessor.java:48)
... 17 more
Caused by: java.lang.RuntimeException: Cannot spawn an instance
at
org.apache.stratos.autoscaler.rule.RuleTasksDelegator.delegateSpawn(RuleTasksDelegator.java:107)
... 22 moreCaused by:
org.apache.stratos.autoscaler.exception.SpawningException: Failed to start an
instance. MemberContext
[memberId=lb01.lb01.domaincde68af9-d82f-42c3-9c01-4fd8da565867, nodeId=null,
clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null,
publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177492261,
lbClusterId=null, networkPartitionId=N1] Cause: error running 1 node
group(lb01lb01) location(RegionOne) image(0299be4d-a743-4424-ae28-f40bd4faa669)
size(2e2e2b47-9f40-4bd8-9777-83e802f5f1cd) options({inboundPorts=[],
autoAssignFloatingIp=false, securityGroupNames=[default], keyPairName=phoenix,
userData=[B@14edd531, configDrive=false,
novaNetworks=[Network{networkUuid=42c4a88d-0d59-4fbb-90f0-9b9806f9c17c,
portUuid=null, fixedIp=172.16.2.201},
Network{networkUuid=6a2615e4-760c-4c93-895d-b4b16e550193, portUuid=null,
fixedIp=10.81.69.201},
Network{networkUuid=670550f0-67fc-48ff-a33c-e184a7908247, portUuid=null,
fixedIp=10.13.5.81}]})
Execution failures:
0 error[s]
Node failures:
1) IllegalStateException on node
RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:java.lang.IllegalStateException:
node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status
running; aborting after 1 seconds with final status: ERROR
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
at
com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
1 error[s] at
org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:174)
at
org.apache.stratos.autoscaler.rule.RuleTasksDelegator.delegateSpawn(RuleTasksDelegator.java:87)
... 22 moreCaused by: org.apache.axis2.AxisFault: Failed to start an
instance. MemberContext
[memberId=lb01.lb01.domaincde68af9-d82f-42c3-9c01-4fd8da565867, nodeId=null,
clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null,
publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177492261,
lbClusterId=null, networkPartitionId=N1] Cause: error running 1 node
group(lb01lb01) location(RegionOne) image(0299be4d-a743-4424-ae28-f40bd4faa669)
size(2e2e2b47-9f40-4bd8-9777-83e802f5f1cd) options({inboundPorts=[],
autoAssignFloatingIp=false, securityGroupNames=[default], keyPairName=phoenix,
userData=[B@14edd531, configDrive=false,
novaNetworks=[Network{networkUuid=42c4a88d-0d59-4fbb-90f0-9b9806f9c17c,
portUuid=null, fixedIp=172.16.2.201},
Network{networkUuid=6a2615e4-760c-4c93-895d-b4b16e550193, portUuid=null,
fixedIp=10.81.69.201},
Network{networkUuid=670550f0-67fc-48ff-a33c-e184a7908247, portUuid=null,
fixedIp=10.13.5.81}]})
Execution failures:
0 error[s]Node failures:
1) IllegalStateException on node
RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10:java.lang.IllegalStateException:
node(RegionOne/887351d5-2c16-48b9-927e-0ca5f13fcc10) didn't achieve the status
running; aborting after 1 seconds with final status: ERROR
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:72)
at
org.jclouds.compute.functions.PollNodeRunning.apply(PollNodeRunning.java:45)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.call(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:121)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:146)
at
org.jclouds.compute.strategy.CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.apply(CustomizeNodeAndAddToGoodMapOrPutExceptionIntoBadMap.java:53)
at com.google.common.util.concurrent.Futures$1.apply(Futures.java:711)
at
com.google.common.util.concurrent.Futures$ChainingListenableFuture.run(Futures.java:849)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
1 error[s]
at
org.apache.axis2.util.Utils.getInboundFaultFromMessageContext(Utils.java:531)
at
org.apache.axis2.description.OutInAxisOperationClient.handleResponse(OutInAxisOperation.java:370)
at
org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:445)
at
org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:225)
at
org.apache.axis2.client.OperationClient.execute(OperationClient.java:149)
at
org.apache.stratos.cloud.controller.stub.CloudControllerServiceStub.startInstance(CloudControllerServiceStub.java:1407)
at
org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient.spawnAnInstance(CloudControllerClient.java:162)
... 23 more
TID: [0] [STRATOS] [2015-03-24 06:13:04,511] INFO
{org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient} -
Trying to spawn an instance via cloud controller: [cluster] lb01.lb01.domain
[partition] RegionOne-Core [lb-cluster] null [network-partition-id] N1
{org.apache.stratos.autoscaler.client.cloud.controller.CloudControllerClient}TID:
[0] [STRATOS] [2015-03-24 06:13:09,114] INFO
{org.wso2.carbon.databridge.core.DataBridge} - admin connected
{org.wso2.carbon.databridge.core.DataBridge}TID: [0] [STRATOS] [2015-03-24
06:13:16,902] INFO
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl} -
Instance is successfully starting up. MemberContext
[memberId=lb01.lb01.domain64df2bd7-ee48-4ca2-9dec-362b72543d86,
nodeId=RegionOne/aa1a0e56-a722-444a-99f5-080ef844fb2d,
clusterId=lb01.lb01.domain, cartridgeType=lb01, privateIpAddresses=null,
publicIpAddresses=null, allocatedIpAddress=null, initTime=1427177584511,
lbClusterId=null, networkPartitionId=N1]
{org.apache.stratos.cloud.controller.impl.CloudControllerServiceImpl}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)