GitHub user aledsage opened a pull request:

    https://github.com/apache/brooklyn-server/pull/448

    BROOKLYN-394: increase jclouds retry/backoff time

    Question: Is 500ms and 6 retries a sensible level? It feels to me like a 
large backoff is good for API calls to a cloud. I can see this might slow 
things down in some situations (e.g. when it was a transient connectivity 
problem), but that still seems unlikely to happen often. In all the important 
cases I can think of, a larger backoff + retry time seems desirable.
    
    When running the `testCreateMany` to provision 20 VMs concurrently in AWS, 
I managed to cause rate-limiting when calling `RunInstances`, getting back `503 
Service Unavailable` for 6 of the 20 VMs:
    
    ```
    grep -E "JavaUrlHttpCommandExecutorService.*Receiving.* 503 Service 
Unavailable" brooklyn.debug.log 
    2016-11-20 21:41:07,014 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-7]: Receiving response 305126632: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:07,027 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-17]: Receiving response -202425525: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:07,181 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-20]: Receiving response 1461817670: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:07,902 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-7]: Receiving response -412329992: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:07,951 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-17]: Receiving response -2106831550: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:08,094 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-20]: Receiving response -1404718861: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:09,575 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-11]: Receiving response 1776862310: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:11,419 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-15]: Receiving response 1334001839: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service 
Unavailable
    ```
    
    Here's the output for one of them:
    ```
    016-11-20 21:41:07,774 DEBUG o.j.r.i.InvokeHttpMethod [pool-3-thread-13]: 
>> invoking RunInstances
    2016-11-20 21:41:08,189 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response -1425449702: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:08,191 DEBUG o.j.a.h.AWSServerErrorRetryHandler 
[pool-3-thread-13]: Retry 1/6: delaying for 541 ms: server error: 
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract 
org.jclouds.ec2.domain.Reservation 
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
 null, ami-7d7bfc14, 1, 1, 
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:09,141 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response -1388229651: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:09,143 DEBUG o.j.a.h.AWSServerErrorRetryHandler 
[pool-3-thread-13]: Retry 2/6: delaying for 2143 ms: server error: 
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract 
org.jclouds.ec2.domain.Reservation 
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
 null, ami-7d7bfc14, 1, 1, 
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:11,695 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response 1602574625: HTTP/1.1 503 Service 
Unavailable
    2016-11-20 21:41:11,697 DEBUG o.j.a.h.AWSServerErrorRetryHandler 
[pool-3-thread-13]: Retry 3/6: delaying for 4681 ms: server error: 
[method=org.jclouds.aws.ec2.features.AWSInstanceApi.public abstract 
org.jclouds.ec2.domain.Reservation 
org.jclouds.aws.ec2.features.AWSInstanceApi.runInstancesInRegion(java.lang.String,java.lang.String,java.lang.String,int,int,org.jclouds.ec2.options.RunInstancesOptions[])[us-east-1,
 null, ami-7d7bfc14, 1, 1, 
[Lorg.jclouds.ec2.options.RunInstancesOptions;@17ed1f23], request=POST 
https://ec2.us-east-1.amazonaws.com/ HTTP/1.1]
    2016-11-20 21:41:17,536 DEBUG o.j.h.i.JavaUrlHttpCommandExecutorService 
[pool-3-thread-13]: Receiving response 1803030217: HTTP/1.1 200 OK
    ```
    
    Note that it didn't succeed until we'd backed off multiple times for some 
of the `RunInstances` calls, with it taking a 4.7 second backoff above before 
it worked on the 4th attempt. I therefore suspect it was actually making things 
*worse* when we retried after 50ms, 100ms, 200ms, 400ms and 800ms (e.g. causing 
concurrent calls from other threads to be a lot more likely to fail, and not 
succeeding in any of the 5 retries).

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aledsage/brooklyn-server 
BROOKLYN-394-retry-backoff-time

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/brooklyn-server/pull/448.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #448
    
----
commit 18cdc98d36f74da10d8987382dba77994de3b75d
Author: Aled Sage <[email protected]>
Date:   2016-11-20T21:52:51Z

    BROOKLYN-394: increase jclouds retry/backoff time

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to