[
https://issues.apache.org/jira/browse/BROOKLYN-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15681915#comment-15681915
]
ASF GitHub Bot commented on BROOKLYN-394:
-----------------------------------------
GitHub user aledsage opened a pull request:
https://github.com/apache/brooklyn-server/pull/448
BROOKLYN-394: increase jclouds retry/backoff time
Question: Is 500ms and 6 retries a sensible level? It feels to me like a
large backoff is good for API calls to a cloud. I can see this might slow
things down in some situations (e.g. when it was a transient connectivity
problem), but that still seems unlikely to happen often. In all the important
cases I can think of, a larger backoff + retry time seems desirable.
When running the `testCreateMany` to provision 20 VMs concurrently in AWS,
I managed to cause rate-limiting when calling `RunInstances`, getting back `503
Service Unavailable` for 6 of the 20 VMs:
```
grep -E "JavaUrlHttpCommandExecutorService.*Receiving.* 503 Service
Unavailable" brooklyn.debug.log
2016-11-20 21:41:07,014 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-7]: Receiving response 305126632: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:07,027 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-17]: Receiving response -202425525: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:07,181 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-20]: Receiving response 1461817670: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:07,902 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-7]: Receiving response -412329992: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:07,951 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-17]: Receiving response -2106831550: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:08,094 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-20]: Receiving response -1404718861: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:09,575 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-11]: Receiving response 1776862310: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:11,419 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-15]: Receiving response 1334001839: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service
Unavailable
```
Here's the output for one of them:
```
016-11-20 21:41:07,774 DEBUG o.j.r.i.InvokeHttpMethod [pool-3-thread-13]:
>> invoking RunInstances
2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:08,191 DEBUG o.j.a.h.AWSServerErrorRetryHandler
[pool-3-thread-13]: Retry 1/6: delaying for 541 ms: server error:
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract
org.jclouds.ec2.domain.Reservation
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
null, ami-7d7bfc14, 1, 1,
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:09,143 DEBUG o.j.a.h.AWSServerErrorRetryHandler
[pool-3-thread-13]: Retry 2/6: delaying for 2143 ms: server error:
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract
org.jclouds.ec2.domain.Reservation
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
null, ami-7d7bfc14, 1, 1,
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service
Unavailable
2016-11-20 21:41:11,697 DEBUG o.j.a.h.AWSServerErrorRetryHandler
[pool-3-thread-13]: Retry 3/6: delaying for 4681 ms: server error:
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract
org.jclouds.ec2.domain.Reservation
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
null, ami-7d7bfc14, 1, 1,
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
2016-11-20 21:41:17,536 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService
[pool-3-thread-13]: Receiving response 1803030217: HTTP/1.1 200 OK
```
Note that it didn't succeed until we'd backed off multiple times for some
of the `RunInstances` calls, with it taking a 4.7 second backoff above before
it worked on the 4th attempt. I therefore suspect it was actually making things
*worse* when we retried after 50ms, 100ms, 200ms, 400ms and 800ms (e.g. causing
concurrent calls from other threads to be a lot more likely to fail, and not
succeeding in any of the 5 retries).
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/aledsage/brooklyn-server
BROOKLYN-394-retry-backoff-time
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/brooklyn-server/pull/448.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #448
----
commit 18cdc98d36f74da10d8987382dba77994de3b75d
Author: Aled Sage <[email protected]>
Date: 2016-11-20T21:52:51Z
BROOKLYN-394: increase jclouds retry/backoff time
----
> "Request limit exceeded" on Amazon
> ----------------------------------
>
> Key: BROOKLYN-394
> URL: https://issues.apache.org/jira/browse/BROOKLYN-394
> Project: Brooklyn
> Issue Type: Bug
> Reporter: Svetoslav Neykov
>
> Any moderately sized blueprint could trigger {{Request limit exceeded}} on
> Amazon (say kubernetes). The only way users have control over the request
> rate is by setting {{maxConcurrentMachineCreations}} with the current
> recommended value of 3 (see clocker.io).
> It's bad user experience if one needs to adapt the location based on the
> blueprint.
> Possible steps to improve:
> * Add to troubleshooting documentation
> * Make maxConcurrentMachineCreations default to 3
> * Check are we polling for machine creation too often.
> * Check how many requests are we hitting Amazon with (per created machine)
> * The number of requests per machine could vary from blueprint to blueprint
> (say if the blueprint is creating security networks, using other amazon
> services). Is there a way to throttle our requests to amazon and stay below a
> certain limit per second?
> * I've hit the error during machine tear down as well, so
> {{maxConcurrentMachineCreations}} is not enough to work around
> Some docs on rate limits at
> http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html.
> Related: https://github.com/jclouds/legacy-jclouds/issues/1214
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)