Aled Sage created JCLOUDS-1178:
----------------------------------

             Summary: AWS vm deletion failed: did not retry on response "The 
service is unavailable. Please try again shortly"
                 Key: JCLOUDS-1178
                 URL: https://issues.apache.org/jira/browse/JCLOUDS-1178
             Project: jclouds
          Issue Type: Bug
    Affects Versions: 1.9.2
            Reporter: Aled Sage


While using jclouds 1.9.2, while attempting to stop a VM in 
{{aws-ec2:us-east-1}} I hit the error below:

{noformat}
Failed after 39.1s: Error invoking stop at MySqlNodeImpl{id=zrxtil48g6}: 
AWSResponseException: request POST https://ec2.us-east-1.amazonaws.com/ 
HTTP/1.1 failed with code 503, error: 
AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', 
code='Unavailable', message='The service is unavailable. Please try again 
shortly.', context='{Response=, Errors=}'}

org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException:
 Error invoking stop at MySqlNodeImpl{id=zrxtil48g6}: AWSResponseException: 
request POST https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 
503, error: AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', 
requestToken='null', code='Unavailable', message='The service is unavailable. 
Please try again shortly.', context='{Response=, Errors=}'}
        at 
org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException.propagate(EffectorUtils.java:318)
        at 
org.apache.brooklyn.core.mgmt.internal.EffectorUtils$EffectorCallPropagatedRuntimeException.access$100(EffectorUtils.java:285)
        at 
org.apache.brooklyn.core.mgmt.internal.EffectorUtils.handleEffectorException(EffectorUtils.java:326)
        at 
org.apache.brooklyn.core.effector.EffectorTasks$EffectorBodyTaskFactory$2.handleException(EffectorTasks.java:90)
        at 
org.apache.brooklyn.util.core.task.DynamicSequentialTask.handleException(DynamicSequentialTask.java:469)
        at 
org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:417)
        at 
org.apache.brooklyn.util.core.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:519)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.brooklyn.util.exceptions.PropagatedRuntimeException: 
AWSResponseException: request POST https://ec2.us-east-1.amazonaws.com/ 
HTTP/1.1 failed with code 503, error: 
AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', 
code='Unavailable', message='The service is unavailable. Please try again 
shortly.', context='{Response=, Errors=}'}
        at 
org.apache.brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:129)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.doStop(MachineLifecycleEffectorTasks.java:863)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.stop(MachineLifecycleEffectorTasks.java:743)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopEffectorBody.call(MachineLifecycleEffectorTasks.java:308)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopEffectorBody.call(MachineLifecycleEffectorTasks.java:305)
        at 
org.apache.brooklyn.core.effector.EffectorTasks$EffectorBodyTaskFactory$1.call(EffectorTasks.java:82)
        at 
org.apache.brooklyn.util.core.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:359)
        ... 5 more
Caused by: java.util.concurrent.ExecutionException: 
org.jclouds.aws.AWSResponseException: request POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: 
AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', 
code='Unavailable', message='The service is unavailable. Please try again 
shortly.', context='{Response=, Errors=}'}
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:188)
        at 
com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63)
        at org.apache.brooklyn.util.core.task.BasicTask.get(BasicTask.java:361)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.doStop(MachineLifecycleEffectorTasks.java:851)
        ... 10 more
Caused by: org.jclouds.aws.AWSResponseException: request POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1 failed with code 503, error: 
AWSError{requestId='d4a38d1b-4f02-4161-b7ff-6cd78bdefe41', requestToken='null', 
code='Unavailable', message='The service is unavailable. Please try again 
shortly.', context='{Response=, Errors=}'}
        at 
org.jclouds.aws.handlers.ParseAWSErrorFromXmlContent.handleError(ParseAWSErrorFromXmlContent.java:75)
        at 
org.jclouds.http.handlers.DelegatingErrorHandler.handleError(DelegatingErrorHandler.java:67)
        at 
org.jclouds.http.internal.BaseHttpCommandExecutorService.shouldContinue(BaseHttpCommandExecutorService.java:136)
        at 
org.jclouds.http.internal.BaseHttpCommandExecutorService.invoke(BaseHttpCommandExecutorService.java:105)
        at 
org.jclouds.rest.internal.InvokeHttpMethod.invoke(InvokeHttpMethod.java:90)
        at 
org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:73)
        at 
org.jclouds.rest.internal.InvokeHttpMethod.apply(InvokeHttpMethod.java:44)
        at 
org.jclouds.reflect.FunctionalReflection$FunctionalInvocationHandler.handleInvocation(FunctionalReflection.java:117)
        at 
com.google.common.reflect.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:87)
        at com.sun.proxy.$Proxy160.describeInstancesInRegion(Unknown Source)
        at 
org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getRunningInstanceInRegion(EC2GetNodeMetadataStrategy.java:64)
        at 
org.jclouds.aws.ec2.compute.strategy.AWSEC2GetNodeMetadataStrategy.getRunningInstanceInRegion(AWSEC2GetNodeMetadataStrategy.java:52)
        at 
org.jclouds.ec2.compute.strategy.EC2GetNodeMetadataStrategy.getNode(EC2GetNodeMetadataStrategy.java:56)
        at 
org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:42)
        at 
org.jclouds.compute.predicates.AtomicNodeTerminated.refreshOrNull(AtomicNodeTerminated.java:28)
        at 
org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:46)
        at 
org.jclouds.compute.predicates.internal.TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.apply(TrueIfNullOrDeletedRefreshAndDoubleCheckOnFalse.java:31)
        at 
org.jclouds.util.Predicates2$RetryablePredicate.apply(Predicates2.java:117)
        at 
org.jclouds.compute.internal.BaseComputeService.doDestroyNode(BaseComputeService.java:302)
        at 
org.jclouds.compute.internal.BaseComputeService.destroyNode(BaseComputeService.java:243)
        at 
org.apache.brooklyn.location.jclouds.JcloudsLocation.releaseNode(JcloudsLocation.java:2701)
        at 
org.apache.brooklyn.location.jclouds.JcloudsLocation.release(JcloudsLocation.java:2651)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks.stopAnyProvisionedMachines(MachineLifecycleEffectorTasks.java:1021)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:875)
        at 
org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$StopAnyProvisionedMachinesTask.call(MachineLifecycleEffectorTasks.java:873)
        ... 6 more
{noformat}

Digging into the jclouds wire log, I see:

{noformat}
2016-09-13 21:00:09,811 DEBUG o.j.r.i.InvokeHttpMethod 
[brooklyn-execmanager-YxAEEeJK-3722]: >> invoking DescribeInstances
2016-09-13 21:00:09,812 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[brooklyn-execmanager-YxAEEeJK-3722]: Sending request -1247513718: POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1
2016-09-13 21:00:09,812 DEBUG jclouds.wire 
[brooklyn-execmanager-YxAEEeJK-3722]: >> 
"Action=DescribeInstances&InstanceId.1=i-331bcea0&Version=2012-06-01"
2016-09-13 21:00:09,812 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: >> POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1
2016-09-13 21:00:09,812 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: >> Host: ec2.us-east-1.amazonaws.com
2016-09-13 21:00:09,812 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: >> X-Amz-Date: 20160913T200009Z
2016-09-13 21:00:09,812 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: >> Authorization: AWS4-HMAC-SHA256 
Credential=AAAAAAAAAAAAAAAAAAAA/20160913/us-east-1/ec2/aws4_request, 
SignedHeaders=content-type;host;x-amz-date, 
Signature=84cf762b171d56299b096a106733e72bbf96d5654b4c607135cea8cd7be51689
2016-09-13 21:00:09,812 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: >> Content-Type: 
application/x-www-form-urlencoded
2016-09-13 21:00:09,812 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: >> Content-Length: 67
2016-09-13 21:00:10,294 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[brooklyn-execmanager-YxAEEeJK-3722]: Receiving response -1247513718: HTTP/1.1 
503 Service Unavailable
2016-09-13 21:00:10,294 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: << HTTP/1.1 503 Service Unavailable
2016-09-13 21:00:10,294 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: << Date: Tue, 13 Sep 2016 20:00:10 GMT
2016-09-13 21:00:10,294 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: << Transfer-Encoding: chunked
2016-09-13 21:00:10,294 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: << Connection: close
2016-09-13 21:00:10,294 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: << Server: AmazonEC2
2016-09-13 21:00:10,294 DEBUG jclouds.headers 
[brooklyn-execmanager-YxAEEeJK-3722]: << Content-Type: application/unknown
2016-09-13 21:00:10,296 DEBUG jclouds.wire 
[brooklyn-execmanager-YxAEEeJK-3722]: << "<?xml version="1.0" 
encoding="UTF-8"?>[\n]"
2016-09-13 21:00:10,296 DEBUG jclouds.wire 
[brooklyn-execmanager-YxAEEeJK-3722]: << 
"<Response><Errors><Error><Code>Unavailable</Code><Message>The service is 
unavailable. Please try again 
shortly.</Message></Error></Errors><RequestID>d4a38d1b-4f02-4161-b7ff-6cd78bdefe41</RequestID></Response>"
{noformat}

In the jclouds code 
{{org.jclouds.aws.config.AWSHttpApiModule.provideRetryableServerCodes}}, I see 
that "Unavailable" is not a code that for which we'll retry (we'll only retry 
on server errors "RequestLimitExceeded" and "InternalError".
(These codes are passed into 
{{AWSServerErrorRetryHandler.shouldRetryRequestOnError}}.)

Looking at 
http://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html, I 
think we should also retry on:
* {{ServiceUnavailable}}: The request has failed due to a temporary failure of 
the server.
* {{Unavailable}}: The server is overloaded and can't handle the request.

I also wonder for client errors whether we should retry on:
* {{RequestExpired}} (given that we retry on {{SignatureDoesNotMatch}} - will 
we automatically reauthenticate?).
* {{ConcurrentTagAccess}}: You can't run simultaneous commands to modify a tag 
for a specific resource. Allow sufficient wait time for the previous request to 
complete, then retry your request. For more information, see Error Retries and 
Exponential Backoff in AWS.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to