[ 
https://issues.apache.org/jira/browse/BROOKLYN-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15681915#comment-15681915
 ] 

ASF GitHub Bot commented on BROOKLYN-394:
-----------------------------------------

GitHub user aledsage opened a pull request:

    https://github.com/apache/brooklyn-server/pull/448

    BROOKLYN-394: increase jclouds retry/backoff time

    Question: Is 500ms and 6 retries a sensible level? It feels to me like a 
large backoff is good for API calls to a cloud. I can see this might slow 
things down in some situations (e.g. when it was a transient connectivity 
problem), but that still seems unlikely to happen often. In all the important 
cases I can think of, a larger backoff + retry time seems desirable.
    
    When running the `testCreateMany` to provision 20 VMs concurrently in AWS, 
I managed to cause rate-limiting when calling `RunInstances`, getting back `503 
Service Unavailable` for 6 of the 20 VMs:
    
    ```
    grep -E "JavaUrlHttpCommandExecutorService.*Receiving.* 503 Service 
Unavailable" brooklyn.debug.log 
    2016-11-20 21:41:07,014 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-7]: Receiving response 305126632: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:07,027 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-17]: Receiving response -202425525: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:07,181 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-20]: Receiving response 1461817670: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:07,902 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-7]: Receiving response -412329992: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:07,951 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-17]: Receiving response -2106831550: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:08,094 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-20]: Receiving response -1404718861: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:09,575 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-11]: Receiving response 1776862310: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:11,419 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-15]: Receiving response 1334001839: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service 
Unavailable
    ```
    
    Here's the output for one of them:
    ```
    016-11-20 21:41:07,774 DEBUG o.j.r.i.InvokeHttpMethod [pool-3-thread-13]: 
>> invoking RunInstances
    2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:08,191 DEBUG o.j.a.h.AWSServerErrorRetryHandler 
[pool-3-thread-13]: Retry 1/6: delaying for 541 ms: server error: 
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract 
org.jclouds.ec2.domain.Reservation 
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
 null, ami-7d7bfc14, 1, 1, 
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:09,143 DEBUG o.j.a.h.AWSServerErrorRetryHandler 
[pool-3-thread-13]: Retry 2/6: delaying for 2143 ms: server error: 
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract 
org.jclouds.ec2.domain.Reservation 
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
 null, ami-7d7bfc14, 1, 1, 
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:11,697 DEBUG o.j.a.h.AWSServerErrorRetryHandler 
[pool-3-thread-13]: Retry 3/6: delaying for 4681 ms: server error: 
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract 
org.jclouds.ec2.domain.Reservation 
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
 null, ami-7d7bfc14, 1, 1, 
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:17,536 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response 1803030217: HTTP/1.1 200 OK
    ```
    
    Note that it didn't succeed until we'd backed off multiple times for some 
of the `RunInstances` calls, with it taking a 4.7 second backoff above before 
it worked on the 4th attempt. I therefore suspect it was actually making things 
*worse* when we retried after 50ms, 100ms, 200ms, 400ms and 800ms (e.g. causing 
concurrent calls from other threads to be a lot more likely to fail, and not 
succeeding in any of the 5 retries).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aledsage/brooklyn-server 
BROOKLYN-394-retry-backoff-time

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/brooklyn-server/pull/448.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #448
    
----
commit 18cdc98d36f74da10d8987382dba77994de3b75d
Author: Aled Sage <[email protected]>
Date:   2016-11-20T21:52:51Z

    BROOKLYN-394: increase jclouds retry/backoff time

----


> "Request limit exceeded" on Amazon
> ----------------------------------
>
>                 Key: BROOKLYN-394
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-394
>             Project: Brooklyn
>          Issue Type: Bug
>            Reporter: Svetoslav Neykov
>
> Any moderately sized blueprint could trigger {{Request limit exceeded}} on 
> Amazon (say kubernetes). The only way users have control over the request 
> rate is by setting {{maxConcurrentMachineCreations}} with the current 
> recommended value of 3 (see clocker.io).
> It's bad user experience if one needs to adapt the location based on the 
> blueprint.
> Possible steps to improve:
> * Add to troubleshooting documentation
> * Make maxConcurrentMachineCreations default to 3
> * Check are we polling for machine creation too often.
> * Check how many requests are we hitting Amazon with (per created machine)
> * The number of requests per machine could vary from blueprint to blueprint 
> (say if the blueprint is creating security networks, using other amazon 
> services). Is there a way to throttle our requests to amazon and stay below a 
> certain limit per second?
> * I've hit the error during machine tear down as well, so 
> {{maxConcurrentMachineCreations}} is not enough to work around
> Some docs on rate limits at 
> http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html.
> Related: https://github.com/jclouds/legacy-jclouds/issues/1214



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to