On 3/23/15, 6:11 AM, Udara Liyanage wrote:
Hi,

I could reproduce this in Openstack. The region and image id of the iaasProvider is null at the time of IP releasing. When I set the region in cloud-controller.xml (which is not a solution, just for testing) it works without the issue.

[2015-03-23 15:25:23,067] INFO {org.apache.stratos.cloud.controller.iaases.JcloudsIaas} - Member terminated: [member-id] single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06 [2015-03-23 15:25:23,076] INFO {org.apache.stratos.cloud.controller.messaging.publisher.TopologyEventPublisher} - Publishing member terminated event: [service-name] php [cluster-id] single-cartridge-app.my-php.php.domain [cluster-instance-id] single-cartridge-app-1 [member-id] single-cartridge-app.my-php.php.domaine4fb4a32-64b1-4804-877f-2e93748f6a06 [network-partition-id] network-partition-1 [partition-id] partition-1 [group-id] null
[2015-03-23 15:25:23,084]  INFO {org.apache.



Udara,

Thanks for looking at this.

I've confirmed that adding the following to the cloud-controller iaasProvider also seems to cover up the problem, I agree, clearly not a solution.


@@ -13,4 +13,5 @@
         <property name="openstack.networking.provider" value="nova" />
        <property name="X" value="x" />
        <property name="Y" value="y" />
+       <property name="region" value="RegionOne" />
 </iaasProvider>

We'll fill a bug to track this.

There's also the matter that after stratos detects that the VM is inactive, (as shown in log snippet below at 18.57:51), the VM continues to be reported as "ACTIVE" in the topology events until it's terminated at 18:59:05. Is there logic in place that will return this VM to service if the VM is detected before the CEP publishes member fault event?



TID: [0] [STRATOS] [2015-03-23 18:57:51,932] WARN {org.apache.stratos.autoscaler.status.processor.group.GroupStatusInactiveProcessor} - Sending application instance inactive for [Application] cisco-sample-vm [ApplicationInstance] cisco-sample-vm-1 TID: [0] [STRATOS] [2015-03-23 18:57:51,941] INFO {org.apache.stratos.autoscaler.applications.topic.ApplicationsEventPublisher} - Publishing application inactivated event: [application] cisco-sample-vm [instance] cisco-sample-vm-1 TID: [0] [STRATOS] [2015-03-23 18:58:51,883] INFO {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Faulty member detected [member-id] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965 with [last time-stamp] 1427136970708 [time-out] 60000 milliseconds TID: [0] [STRATOS] [2015-03-23 18:58:51,884] INFO {org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} - Publishing member fault event for [member-id] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965
.....

TID: [0] [STRATOS] [2015-03-23 18:59:05,887] INFO {org.apache.stratos.common.client.CloudControllerServiceClient} - Terminating instance via cloud controller: [member] cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain4e32a138-4c48-46e0-a7aa-2949cd841965


-Vanson

On Mon, Mar 23, 2015 at 11:07 AM, Udara Liyanage <[email protected] 
<mailto:[email protected]>> wrote:

    Hi,

    I will have a look.

    On Mon, Mar 23, 2015 at 3:38 AM, Vanson Lim <[email protected] 
<mailto:[email protected]>> wrote:

        Devs,

        We are continuing to work on testing the latest stratos 4.1.0 codebase.

        This problem is seen only for  VM that have floating ip.   I've tested 
with the non floating ip case and don't see issues.

        The error return code from jcloud api call is preventing stratos from 
cleaning up its state.

        Stratos seems to forever throw tracebacks as it repeatedly tries to 
terminate the faulty instance.

        Meanwhile, the "down" VM is still being reported as active in the 
topology events, which seems wrong.  If stratos detects that
        the VM is faulty, shouldn't it report it immediately in the topology 
events?  Stratos currently has the following states define
        and none of them seem to be appropriate.

            Created
            Initialized
            Starting
            Active
            In_Maintenance
            ReadyToShutdown
            Suspended
            Terminated


        Do we need new state TIMED-OUT state that stratos reports for VM as 
stratos works to terminate it?

        How to reproduce this issue:

        1) Start a sample cartridge instance that has a floating ip.

        2) wait for sample cartridge to become active
        3) terminate sample vm via openstack horizon interface, and wait for 
stratos to detect VM the error.


        Testing using a version of stratos built off the following commit id:

            commit 01dd9e491ad3acf7cc4e0f2895aaba336b82539d
            Author: R-Rajkumar <[email protected]> 
<mailto:[email protected]>
            Date:   Fri Mar 20 19:51:06 2015 +0530

                fixing an NPE in AS


        I've attached the full wso2carbon.log  Included below is the observed 
traceback:

        -Vanson


        TID: [0] [STRATOS] [2015-03-22 20:53:21,554] INFO 
{org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Publishing
        member fault event for [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
        TID: [0] [STRATOS] [2015-03-22 20:54:06,386] INFO 
{org.apache.stratos.common.client.CloudControllerServiceClient} -  Terminating
        instance via cloud controller: [member] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
        TID: [0] [STRATOS] [2015-03-22 20:54:06,399] INFO 
{org.apache.stratos.cloud.controller.iaases.JcloudsIaas} -  Starting to
        terminate member: [cartridge-type] cisco-sample-vm [member-id]
        
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
TID: [0] [STRATOS] [2015-03-22 20:54:06,450] ERROR {org.apache.stratos.cloud.controller.services.impl.InstanceTerminator} - Instance termination failed! MemberContext [applicationId=cisco-sample-vm, cartridgeType=cisco-sample-vm,
        clusterId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain,
        
memberId=cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd,
        instanceId=RegionOne/83751110-4e5b-4aef-b6a3-c291c9eaad3d, 
partition=Partition [id=whole-region, description=null,
        isPublic=false, provider=Core, properties=Properties 
[properties=[Property [name=region, value=RegionOne]]]],
        defaultPrivateIP=172.16.2.17, defaultPublicIP=10.0.0.102, 
allocatedIPs=[10.0.0.102], publicIPs=[10.0.0.102],
        privateIPs=[172.16.2.17], initTime=1427057106433, lbClusterId=null, 
networkPartitionId=RegionOne, kubernetesPodId=null,
        kubernetesPodLabel=null, loadBalancingIPType=Private,
        
instanceMetadata=org.apache.stratos.cloud.controller.domain.InstanceMetadata@5b176e44,
 properties=Properties
        [properties=[Property [name=PRIMARY, value=false], Property 
[name=MIN_COUNT, value=1]]]]
        java.lang.NullPointerException: arg[0] in 
{invocation=org.jclouds.openstack.nova.v2_0.NovaApi.public abstract
        com.google.common.base.Optional
        
org.jclouds.openstack.nova.v2_0.NovaApi.getFloatingIPExtensionForZone(java.lang.String)[null],result={annotationParser={caller=NovaApi.getFloatingIPExtensionForZone[null]}}}
                at 
com.google.common.base.Preconditions.checkNotNull(Preconditions.java:253)
                at
        
org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:67)
                at
        
org.jclouds.openstack.v2_0.functions.PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.apply(PresentWhenExtensionAnnotationNamespaceEqualsAnyNamespaceInExtensionsSet.java:43)
                at 
org.jclouds.rest.internal.DelegatesToInvocationFunction.propagateContextToDelegate(DelegatesToInvocationFunction.java:205)
                at 
org.jclouds.rest.internal.DelegatesToInvocationFunction.handle(DelegatesToInvocationFunction.java:154)
                at 
org.jclouds.rest.internal.DelegatesToInvocationFunction.invoke(DelegatesToInvocationFunction.java:123)
                at 
com.sun.proxy.$Proxy119.getFloatingIPExtensionForZone(Unknown Source)
                at
        
org.apache.stratos.cloud.controller.iaases.openstack.networking.NovaNetworkingApi.releaseAddress(NovaNetworkingApi.java:239)
                at 
org.apache.stratos.cloud.controller.iaases.openstack.OpenstackIaas.releaseAddress(OpenstackIaas.java:239)
                at 
org.apache.stratos.cloud.controller.iaases.JcloudsIaas.destroyNode(JcloudsIaas.java:334)
                at 
org.apache.stratos.cloud.controller.iaases.JcloudsIaas.terminateInstance(JcloudsIaas.java:314)
                at 
org.apache.stratos.cloud.controller.services.impl.InstanceTerminator.run(InstanceTerminator.java:56)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                at java.lang.Thread.run(Thread.java:745)
        TID: [0] [STRATOS] [2015-03-22 20:54:21,563] INFO 
{org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Faulty
        member detected [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd
 with [last
        time-stamp] 1427057336960 [time-out] 60000 milliseconds
        TID: [0] [STRATOS] [2015-03-22 20:54:21,563] INFO 
{org.apache.stratos.cep.extension.FaultHandlingWindowProcessor} -  Publishing
        member fault event for [member-id] 
cisco-sample-vm.cisco-sample-vm.cisco-sample-vm.domain85d6eda0-1df5-4be2-b846-4817cc5292cd




--
    Udara Liyanage
    Software Engineer
    WSO2, Inc.: http://wso2.com <http://wso2.com/>
    lean. enterprise. middleware

    web: http://udaraliyanage.wordpress.com
    phone: +94 71 443 6897




--

Udara Liyanage
Software Engineer
WSO2, Inc.: http://wso2.com <http://wso2.com/>
lean. enterprise. middleware

web: http://udaraliyanage.wordpress.com
phone: +94 71 443 6897

Reply via email to