[
https://issues.apache.org/jira/browse/STRATOS-1282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384279#comment-14384279
]
Vanson Lim commented on STRATOS-1282:
-------------------------------------
Udara,
Thanks for the fix. I've verified on my setup that I no longer see the
traceback, and this test cases seems to be
behaving properly now.
I took a look at the diffs associated with this commit and have some minor
comments.
-Vanson
> Stratos4.1.0 - error cleaning up VMs (that have floatingip) terminated
> through Openstack horizon
> ------------------------------------------------------------------------------------------------
>
> Key: STRATOS-1282
> URL: https://issues.apache.org/jira/browse/STRATOS-1282
> Project: Stratos
> Issue Type: Bug
> Components: Cloud Controller
> Affects Versions: 4.1.0 Beta
> Reporter: Martin Eppel
> Priority: Blocker
>
> On 3/23/15, 6:11 AM, Udara Liyanage wrote:
> Hi,
> I could reproduce this in Openstack. The region and image id of the
> iaasProvider is null at the time of IP releasing. When I set the region in
> cloud-controller.xml (which is not a solution, just for testing) it works
> without the issue.
> [2015-03-23 15:25:23,067] INFO
> {org.apache.stratos.cloud.controller.iaases.JcloudsIaas} - Member
> terminated: [member-id]
> single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06
> [2015-03-23 15:25:23,076] INFO
> {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher}
> - Publishing member terminated event: [service-name] php [cluster-id]
> single-cartridge-app.my-php.php.domain [cluster-instance-id]
> single-cartridge-app-1 [member-id]
> single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06
> [network-partition-id] network-partition-1 [partition-id] partition-1
> [group-id] null
> [2015-03-23 15:25:23,084] INFO {org.apache.
> Udara,
> Thanks for looking at this.
> I've confirmed that adding the following to the cloud-controller iaasProvider
> also seems to cover up the problem, I agree, clearly not a solution.
> @@ -13,4 +13,5 @@
> <property name="openstack.networking.provider" value="nova" />
> <property name="X" value="x" />
> <property name="Y" value="y" />
> + <property name="region" value="RegionOne" />
> </iaasProvider>
> We'll fill a bug to track this.
> There's also the matter that after stratos detects that the VM is inactive,
> (as shown in log snippet below at 18.57:51), the VM continues to be reported
> as "ACTIVE" in the topology
> events until it's terminated at 18:59:05. Is there logic in place that
> will return this VM to service if the VM is detected before the CEP publishes
> member fault event?
> TID: [0] [STRATOS] [2015-03-23 18:57:51,932] WARN
> {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor}
> - Sending application instance inactive for [Application] cisco-sample-vm
> [ApplicationInstance] cisco-sample-vm-1
> TID: [0] [STRATOS] [2015-03-23 18:57:51,941] INFO
> {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher}
> - Publishing application inactivated event: [application] cisco-sample-vm
> [instance] cisco-sample-vm-1
> TID: [0] [STRATOS] [2015-03-23 18:58:51,883] INFO
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Faulty
> member detected [member-id]
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
> with [last time-stamp] 1427136970708 [time-out] 60000 milliseconds
> TID: [0] [STRATOS] [2015-03-23 18:58:51,884] INFO
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Publishing
> member fault event for [member-id]
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
> .....
> TID: [0] [STRATOS] [2015-03-23 18:59:05,887] INFO
> {org.apache.stratos.common.client.CloudControllerServiceClient} -
> Terminating instance via cloud controller: [member]
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
> -Vanson
> On Mon, Mar 23, 2015 at 11:07 AM, Udara Liyanage <[email protected]> wrote:
> Hi,
> I will have a look.
> On Mon, Mar 23, 2015 at 3:38 AM, Vanson Lim <[email protected]> wrote:
> Devs,
> We are continuing to work on testing the latest stratos 4.1.0 codebase.
> This problem is seen only for VM that have floating ip. I've tested with
> the non floating ip case and don't see issues.
> The error return code from jcloud api call is preventing stratos from
> cleaning up its state.
> Stratos seems to forever throw tracebacks as it repeatedly tries to terminate
> the faulty instance.
> Meanwhile, the "down" VM is still being reported as active in the topology
> events, which seems wrong. If stratos detects that the VM is faulty,
> shouldn't it report it immediately in the topology events? Stratos currently
> has the following states define and none of them seem to be appropriate.
> Created
> Initialized
> Starting
> Active
> In_Maintenance
> ReadyToShutdown
> Suspended
> Terminated
> Do we need new state TIMED-OUT state that stratos reports for VM as stratos
> works to terminate it?
> How to reproduce this issue:
> 1) Start a sample cartridge instance that has a floating ip.
> 2) wait for sample cartridge to become active
> 3) terminate sample vm via openstack horizon interface, and wait for stratos
> to detect VM the error.
> Testing using a version of stratos built off the following commit id:
> commit 01dd9e491ad3acf7cc4e0f2895aaba336b82539d
> Author: R-Rajkumar <[email protected]>
> Date: Fri Mar 20 19:51:06 2015 +0530
> fixing an NPE in AS
> I've attached the full wso2carbon.log Included below is the observed
> traceback:
> -Vanson
> TID: [0] [STRATOS] [2015-03-22 20:53:21,554] INFO
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Publishing
> member fault event for [member-id]
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,386] INFO
> {org.apache.stratos.common.client.CloudControllerServiceClient} -
> Terminating instance via cloud controller: [member]
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,399] INFO
> {org.apache.stratos.cloud.controller.iaases.JcloudsIaas} - Starting to
> terminate member: [cartridge-type] cisco-sample-vm [member-id]
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> TID: [0] [STRATOS] [2015-03-22 20:54:06,450] ERROR
> {org.apache.stratos.cloud.controller.services.impl.InstanceTerminator} -
> Instance termination failed! MemberContext [applicationId=cisco-sample-vm,
> cartridgeType=cisco-sample-vm,
> clusterId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain,
> memberId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd,
> instanceId=RegionOne/83751110-4e5b-4aef-b6a3-c291c9eaad3d,
> partition=Partition [id=whole-region, description=null, isPublic=false,
> provider=Core, properties=Properties [properties=[Property [name=region,
> value=RegionOne]]]], defaultPrivateIP=172.16.2.17,
> defaultPublicIP=10.0.0.102, allocatedIPs=[10.0.0.102],
> publicIPs=[10.0.0.102], privateIPs=[172.16.2.17], initTime=1427057106433,
> lbClusterId=null, networkPartitionId=RegionOne, kubernetesPodId=null,
> kubernetesPodLabel=null, loadBalancingIPType=Private,
> instanceMetadata=org.apache.stratos.cloud.controller.domain.InstanceMetadata@5b176e44,
> properties=Properties [properties=[Property [name=PRIMARY, value=false],
> Property [name=MIN_COUNT, value=1]]]]
> java.lang.NullPointerException: arg[0] in
> {invocation=org.jclouds.openstack.nova.v2_0.NovaApi.public abstract
> com.google.common.base.Optional
> org.jclouds.openstack.nova.v2_0.NovaApi.getFloatingIPExtensionForZone(java.lang.String)[null],result={annotationParser={caller=NovaApi.getFloatingIPExtensionForZone[null]}}}
> at
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253)
> at
> org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:67)
> at
> org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:43)
> at
> org.jclouds.rest.internal.DelegatesToInvocationFunction.propagateContextToDelegate(DelegatesToInvocationFunction.java:205)
> at
> org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:154)
> at
> org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123)
> at com.sun.proxy.$Proxy119.getFloatingIPExtensionForZone(Unknown
> Source)
> at
> org.apache.stratos.cloud.controller.iaases.openstack.networking.NovaNetworkingApi.releaseAddress(NovaNetworkingApi.java:239)
> at
> org.apache.stratos.cloud.controller.iaases.openstack.OpenstackIaas.releaseAddress(OpenstackIaas.java:239)
> at
> org.apache.stratos.cloud.controller.iaases.JcloudsIaas.destroyNode(JcloudsIaas.java:334)
> at
> org.apache.stratos.cloud.controller.iaases.JcloudsIaas.terminateInstance(JcloudsIaas.java:314)
> at
> org.apache.stratos.cloud.controller.services.impl.InstanceTerminator.run(InstanceTerminator.java:56)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> TID: [0] [STRATOS] [2015-03-22 20:54:21,563] INFO
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Faulty
> member detected [member-id]
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> with [last time-stamp] 1427057336960 [time-out] 60000 milliseconds
> TID: [0] [STRATOS] [2015-03-22 20:54:21,563] INFO
> {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Publishing
> member fault event for [member-id]
> cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
> --
> Udara Liyanage
> Software Engineer
> WSO2, Inc.: http://wso2.com
> lean. enterprise. middleware
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897
> --
> Udara Liyanage
> Software Engineer
> WSO2, Inc.: http://wso2.com
> lean. enterprise. middleware
> web: http://udaraliyanage.wordpress.com
> phone: +94 71 443 6897
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)