[ 
https://issues.apache.org/jira/browse/BROOKLYN-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15690017#comment-15690017
 ] 

Aled Sage commented on BROOKLYN-394:
------------------------------------

[~alex.heneveld] (cc [~andreaturli]) I've raised 
https://issues.apache.org/jira/browse/JCLOUDS-1203, but that is just about 
changing the rate-limit default retry/backoff times.

I'd reword from "improved it a bit" to "improved it a huge amount". Though I 
confess I've not repeated the experiments! Previously, if you tried 
provisioning 20 machines concurrently, you'd maybe get approx 25% rate-limited. 
Most/several of those would fail because we were retrying very quickly. Now 
that we backoff for longer, those are likely to all succeed.

If we repeated this for 200 VMs, then I agree we'd likely still have serious 
problems with some of the VMs failing.

If we were to somehow have all provisioning threads in that jclouds 
{{ComputeService}} collaborating on their back-off, then I agree we'd improve 
things further.

But we'd still hit problems if there were multiple Brooklyn instances trying to 
provision a lot of VMs in the same AWS account.

---
I'd argue that the current exponential backoff is a fine compromise between 
simplicity and functionality, as long as we back off for long enough. If we're 
confident that the "request limit exceeded" definitely means rate-limiting, 
then arguably we should keep trying for a very long time! We should probably 
back-off to a lot less often than every 5 seconds, and we should keep trying 
for several minutes.

> "Request limit exceeded" on Amazon
> ----------------------------------
>
>                 Key: BROOKLYN-394
>                 URL: https://issues.apache.org/jira/browse/BROOKLYN-394
>             Project: Brooklyn
>          Issue Type: Bug
>            Reporter: Svetoslav Neykov
>            Assignee: Aled Sage
>             Fix For: 0.10.0
>
>
> Any moderately sized blueprint could trigger {{Request limit exceeded}} on 
> Amazon (say kubernetes). The only way users have control over the request 
> rate is by setting {{maxConcurrentMachineCreations}} with the current 
> recommended value of 3 (see clocker.io).
> It's bad user experience if one needs to adapt the location based on the 
> blueprint.
> Possible steps to improve:
> * Add to troubleshooting documentation
> * Make maxConcurrentMachineCreations default to 3
> * Check are we polling for machine creation too often.
> * Check how many requests are we hitting Amazon with (per created machine)
> * The number of requests per machine could vary from blueprint to blueprint 
> (say if the blueprint is creating security networks, using other amazon 
> services). Is there a way to throttle our requests to amazon and stay below a 
> certain limit per second?
> * I've hit the error during machine tear down as well, so 
> {{maxConcurrentMachineCreations}} is not enough to work around
> Some docs on rate limits at 
> http://docs.aws.amazon.com/general/latest/gr/aws_service_limits.html.
> Related: https://github.com/jclouds/legacy-jclouds/issues/1214



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to