[ https://issues.apache.org/jira/browse/BROOKLYN-535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185586#comment-16185586 ]
ASF GitHub Bot commented on BROOKLYN-535: ----------------------------------------- Github user tbouron commented on a diff in the pull request: https://github.com/apache/brooklyn-server/pull/829#discussion_r141829990 --- Diff: policy/src/main/java/org/apache/brooklyn/policy/ha/ServiceRestarter.java --- @@ -133,6 +135,10 @@ protected synchronized void onDetectedFailure(SensorEvent<Object> event) { LOG.warn("ServiceRestarter suspended, so not acting on failure detected at "+entity+" ("+event.getValue()+")"); return; } + if (isEntityStopping()) { + highlightViolation("Failure detected but entity stopping"); + LOG.info("Entity stopping, so ServiceRestarter not acting on failure detected at "+entity+" ("+event.getValue()+")"); --- End diff -- You are missing a `return;` > AWS VM termination failed: did not retry DescribeInstances on > SocketTimeoutException > ------------------------------------------------------------------------------------ > > Key: BROOKLYN-535 > URL: https://issues.apache.org/jira/browse/BROOKLYN-535 > Project: Brooklyn > Issue Type: Bug > Affects Versions: 0.11.0 > Reporter: Aled Sage > > In the same run as I experienced > https://issues.apache.org/jira/browse/BROOKLYN-533... > I deployed an app with approx 100 VMs in AWS. > I then stopped my app, thus terminating all the VMs. However, for one thread > the initial call to {{DescribeInstances}} failed with a > {{SocketTimeoutException}}. This caused the jclouds {{releaseNode}} call to > abort, so my VM was not terminated. > Ideally, this would have been retried because {{DescribeInstances}} is safe > to retry (even though it is a POST request): > A snippet from the log is shown below: > {noformat} > 2017-09-15T17:33:20,767 DEBUG 107 o.j.r.i.InvokeHttpMethod [r-VlI23lev-81201] > >> invoking DescribeInstances > 2017-09-15T17:33:20,767 DEBUG 107 o.j.h.i.JavaUrlHttpCommandExecutorService > [r-VlI23lev-81201] Sending request 908012800: POST > https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 > 2017-09-15T17:33:20,768 DEBUG 107 j.headers [r-VlI23lev-81201] >> POST > https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 > 2017-09-15T17:33:20,768 DEBUG 107 j.headers [r-VlI23lev-81201] >> Host: > ec2.eu-west-1.amazonaws.com > 2017-09-15T17:33:20,768 DEBUG 107 j.headers [r-VlI23lev-81201] >> X-Amz-Date: > 20170915T173320Z > 2017-09-15T17:33:20,768 DEBUG 107 j.headers [r-VlI23lev-81201] >> > Authorization: AWS4-HMAC-SHA256 > Credential=xxxxxxxx/20170915/eu-west-1/ec2/aws4_request, > SignedHeaders=content-type;host;x-amz-date, > Signature=4ad5e8ad6a3c3e250e598d50f03bb391e > c58ca2e47ebe23d0a0fa1099e814bd7 > 2017-09-15T17:33:20,768 DEBUG 107 j.headers [r-VlI23lev-81201] >> > Content-Type: application/x-www-form-urlencoded > 2017-09-15T17:33:20,768 DEBUG 107 j.headers [r-VlI23lev-81201] >> > Content-Length: 76 > 2017-09-15T17:34:20,808 ERROR 107 o.j.h.i.JavaUrlHttpCommandExecutorService > [r-VlI23lev-81201] Command not considered safe to retry because request > method is POST: [method=org.jclouds.aws.ec2.features.AWSInstanceApi.public > abstract java.util.Set org.jclo > uds.aws.ec2.features.AWSInstanceApi.describeInstancesInRegion(java.lang.String,java.lang.String[])[eu-west-1, > [Ljava.lang.String;@63ac6f54], request=POST > https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1] > 2017-09-15T17:34:20,809 ERROR 127 o.a.b.l.j.JcloudsLocation > [r-VlI23lev-81201] Problem releasing machine > SshMachineLocation[54.77.31.96:a...@ec2-54-77-31-96.eu-west-1.compute.amazonaws.com/54.77.31.96:22(id=dwicvm1dq8)] > in JcloudsLocation[AWS Dublin:xxxxxxxx@xxxxxxxx], instance id > eu-west-1/i-0610f4ffd584cb796; ignoring and continuing, will throw > subsequently: org.jclouds.http.HttpResponseException: Read timed out > connecting to POST https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 > org.jclouds.http.HttpResponseException: Read timed out connecting to POST > https://ec2.eu-west-1.amazonaws.com/ HTTP/1.1 > at > org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:122) > [101:jclouds-core:2.0.2.2-20170712_1657] > at > org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:90) > [101:jclouds-core:2.0.2.2-20170712_1657] > at > org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:73) > [101:jclouds-core:2.0.2.2-20170712_1657] > at > org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:44) > [101:jclouds-core:2.0.2.2-20170712_1657] > at > org.jclouds.reflect.FunctionalReflection$FunctionalInvocationHandler.handleInvocation(FunctionalReflection.java:117) > [101:jclouds-core:2.0.2.2-20170712_1657] > at > com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87) > [66:com.google.guava:18.0.0] > at com.sun.proxy.$Proxy179.describeInstancesInRegion(Unknown Source) > [47:aws-ec2:2.0.2] > at > org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getRunningInstanceInRegion(EC2GetNodeMetadataStrategy.java:64) > [77:ec2:2.0.2] > at > org.jclouds.aws.ec2.compute.strategy.AWSEC2GetNodeMetadataStrategy.getRunningInstanceInRegion(AWSEC2GetNodeMetadataStrategy.java:52) > [47:aws-ec2:2.0.2] > at > org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getNode(EC2GetNodeMetadataStrategy.java:56) > [77:ec2:2.0.2] > at > org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:42) > [100:jclouds-compute:2.0.2] > at > org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:28) > [100:jclouds-compute:2.0.2] > at > org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:46) > [100:jclouds-compute:2.0.2] > at > org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:31) > [100:jclouds-compute:2.0.2] > at > org.jclouds.util.Predicates2$RetryablePredicate.apply(Predicates2.java:117) > [101:jclouds-core:2.0.2.2-20170712_1657] > at > org.jclouds.compute.internal.BaseComputeService.doDestroyNode(BaseComputeService.java:309) > [100:jclouds-compute:2.0.2] > at > org.jclouds.compute.internal.BaseComputeService.destroyNode(BaseComputeService.java:250) > [100:jclouds-compute:2.0.2] > at > org.apache.brooklyn.location.jclouds.JcloudsLocation.releaseNode(JcloudsLocation.java:2189) > [127:org.apache.brooklyn.locations-jclouds:0.12.0.SNAPSHOT] > at > org.apache.brooklyn.location.jclouds.JcloudsLocation.release(JcloudsLocation.java:2141) > [127:org.apache.brooklyn.locations-jclouds:0.12.0.SNAPSHOT] > at > org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.stopAnyProvisionedMachines(MachineLifecycleEffectorTasks.java:1033) > [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT] > at > org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:883) > [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT] > at > org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:880) > [131:org.apache.brooklyn.software-base:0.12.0.SNAPSHOT] > at > org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:363) > [122:org.apache.brooklyn.core:0.12.0.SNAPSHOT] > at > org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:529) > [122:org.apache.brooklyn.core:0.12.0.SNAPSHOT] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:?] > at java.lang.Thread.run(Thread.java:748) [?:?] > Caused by: java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?] > at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) > ~[?:?] > at java.net.SocketInputStream.read(SocketInputStream.java:171) ~[?:?] > at java.net.SocketInputStream.read(SocketInputStream.java:141) ~[?:?] > at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) ~[?:?] > at sun.security.ssl.InputRecord.read(InputRecord.java:503) ~[?:?] > at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:983) > ~[?:?] > at > sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1385) > ~[?:?] > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1413) ~[?:?] > at > sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1397) ~[?:?] > at > sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559) > ~[?:?] > at > sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185) > ~[?:?] > at > sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316) > ~[?:?] > at > sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291) > ~[?:?] > at > sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250) > ~[?:?] > at > org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.writePayloadToConnection(JavaUrlHttpCommandExecutorService.java:295) > ~[?:?] > at > org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:171) > ~[?:?] > at > org.jclouds.http.internal.JavaUrlHttpCommandExecutorService.convert(JavaUrlHttpCommandExecutorService.java:65) > ~[?:?] > at > org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:99) > ~[?:?] > ... 27 more > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)