[google-appengine] Re: My Best App Engine Advice Would Be: Throttle Well

Nick (Cloud Platform Support) Tue, 26 May 2015 16:23:12 -0700

Hey Kaan,

I'll echo what Jesse has said about the new efforts in place to provide 
closer work between the community of developers and Cloud Platform Support, 
and I look forward to the good discussions that can be had here, as well as 
working together on stackoverflow and the public issue tracker to make the 
best use of those forums.


Thanks for taking the time to bring up some issues you've been seeing. In 
regards to each of these issues, I'll enumerate them from one to four, 
according to the order they appeared in your post. I'll discuss my 
impression of what the issue may be, or what information is missing in 
order to make a good issue report. I'll also generally comment with some 
advice on where to move next in getting some support eyes on any potential 
issues.


   1. "Request was aborted after waiting too long to attempt to service 
   your request"
   - If you've observed log lines with this error appearing when you have a 
      number of tasks in-queue which seem to overload the processing power of 
      your available instances, this may indicate a platform issue or it may 
also 
      indicate an issue in your own app's config/code, although it's not 
possible 
      to tell without more details, such as the following:
          * the .yaml/.xml config files (mostly the scaling settings are of 
      interest)
          * a brief description of what the system was doing, tending to 
      prefer code snippets, numbers, code, and logs to brief informal verbal 
      description
          * a time-frame and name of an affected instance
      - With such details, an adequate issue report can be created and 
      dealt with in the public issue tracker 
      <https://code.google.com/p/googleappengine/issues/list>, or a valid stack 
      overflow question 
      <http://stackoverflow.com/questions/tagged/google-app-engine> can be 
      created, depending on whether you perceive it to be a platform or user 
code 
      issue.
   2. "google.appengine.api.taskqueue.taskqueue.TransientError"
   - As documented here 
      
<https://cloud.google.com/appengine/docs/python/taskqueue/overview-pull#Python_Leasing_tasks>,
 
      it's possible this can happen when using Pull queues. This can be, as you 
      correctly observe, related to rate-limiting in the infrastructure, 
although 
      you feel the details of how rates are set are not sufficiently 
documented. 
      It's likely that this derives from attempting to lease_tasks() from the 
      queue too often, but it's true that we can't be sure.
      - I definitely understand you here and encourage you to create a 
      public issue tracker thread which can be starred by other users to 
      demonstrate an interest in more detailed documentation around this limit. 
I
      - In the meantime, where we still need to be able to handle these 
      errors on a platform which does allow you to scale up aggressively, in 
the 
      context of a data-center (network) with shared but well-isolated and 
ample 
      resources, error-responses such as these will occur periodically. A 
      well-scalable app can ride out transient errors and rate-limiting with a 
      small application of exponential back-off, non-spiking, etc. I encourage 
      you to take the advice of the docs and attempt to rate-limit when you see 
      this error, as it's likely the lease_rate() per queue is too fast.
      - If you find that a behaviour still appears anomalous to you - that 
      is to say if a behaviour of the system seems out of sync with the 
      documented behaviour - then you should open a public issue tracker 
      <https://code.google.com/p/googleappengine/issues/list> issue with 
      sufficient information to allow investigation. If the issue report 
contains 
      sufficient information, it will be likely to produce a positive result, 
and 
      quickly.
   3. "DeadlineExceededError"
   - This issue can also occur by the same cause as for 2., and it's worth 
      investigating. My advice again is to create a public issue tracker issue 
as 
      soon as you notice something that you perceive to be anomalous about the 
      behaviour of any App Engine system. 
   4. "push/pull queue anomalies"
   - I'm unsure what you mean by this, although as above, if you feel 
      there's an issue on the platform, I want to encourage you to report it 
      adequately, as we're here and happy to 
   
So, to conclude, once each of these issues you bring up can be investigated 
along with the documented behaviour and, if necessary, can be developed 
into a proper issue report for the platform, the public issue tracker issue 
you create will be picked up and brought to the attention of platform 
developers / engineers / support. If, rather than a platform issue, it 
looks like the issue is related to your specific use of the services on the 
platform, you should rather create a stackoverflow question on the related 
tags, to get support in that form. 

Finally, to address what you say in parentheses before the end of "------", 
it's definitely possible to implement easing and rate-limiting on pull 
queue task execution, since the frequency of task lease/execution is 
tune-able in whatever timing logic you set up. 

For push queues, to implement easing, you can define a stepped gradient of 
queues with different configured processing rates, bucket sizes, etc. 
<https://cloud.google.com/appengine/docs/java/config/queue#Defining_Push_Queues_and_Processing_Rates>,
 
and have the task-adding logic read the current state of fullness in the 
various queues (you can store information about the queue fullness/rate, 
etc. in Memcache or Datastore, or just use API calls to the Task Queue 
API), possibly along with API calls to get the number of instances in the 
handler module 
<https://cloud.google.com/appengine/docs/python/modules/functions#get_num_instances>,
 
to determine which queue to step up to / include in the rotation of queues 
which receive tasks (your discretion) when adding tasks with given 
payloads, etc. 

Using the basic building blocks, some complex timing logic can be 
implemented, and if you feel that you'd like to make a feature request such 
as "provide easing parameter in queue configuration", describing how it 
works, the place for feature requests is the public issue tracker 
<https://code.google.com/p/googleappengine/issues/list>.

I hope you've come away from this feeling heard, and with a better 
understanding of where and how to get support with any issues you may 
encounter. I tried to address each of the issues you brought up to make 
sure you get useful information.

Have a great day!


- Nick

On Tuesday, May 26, 2015 at 4:57:06 AM UTC-4, Kaan Soral wrote:
>
> I've been using App Engine for probably something like 5 years, I have one 
> major app that has been running for 5 years, it's very well polished, and 
> the traffic and behaviour of the app is very predictable *knocking on wood*
> I have another app that I've been working on for 3 years, it didn't take 
> off yet, the new app is unpredictable in behaviour, it's vast and 
> unthrottled
>
> While the old app has been handling millions of requests without errors 
> and issues, the new one is failing on even the simplest tasks, the logs are 
> filled with TransientError's, instance failures, transaction failures, the 
> whole thing is chaotic
>
> The old app has throttled queues and basic tasks, the throughput is well 
> calibrated to complete all the tasks in 5 hours, using optimal amount of 
> instances, the traffic is regular, it eases in and eases out throughout the 
> day (without throttling, the old app was in similar state before)
> The new app is built to perform, so it's queues have no limits, it trusts 
> App Engine to scale as much as it can
>
> Well turns out that trust isn't well placed, App Engine is supposed to 
> scale on it's own, yet when you leave the limits to the App Engine, it 
> fails to perform
> You might ask: "Why would I use App Engine if I'm going to manually scale 
> the limits myself?" - That's a good question, If you're going to have to 
> adjust the limits and throughput manually while your app grows, you might 
> as well use AWS or a similar more reliable service
>
> This is mostly a rant post, but the advice is still solid, one has to 
> manually calibrate the throughput of routines to prevent app spikes, the 
> instance births and deaths should always be eased in and eased out, 
> otherwise various services of app engine fail to perform
> On the bright side, throttling also reduces the costs significantly, so 
> it's a good idea to always keep an eye on the app and manually calibrate 
> all routines - on the other side, if your app gains additional traffic 
> without your supervision, these routines will hog and halt
>
> ------
>
> On a more technical side, some of these errors are:
> "Request was aborted after waiting too long to attempt to service your 
> request." - they come in 100's - flood the logs - these are taskqueue 
> push tasks, so the error is pretty stupid, if they can't be handled, they 
> should be left in the queue
> "google.appengine.api.taskqueue.taskqueue.TransientError" - these are 
> from pull queues, there are invisible/untold limits of pull queues, this is 
> also very concerning, because if your app grows, your scale might be bound 
> by these limits, so try not to use pull queues too much
> "DeadlineExceededError's" - these are pretty random and rare, yet when 
> you run thousands of tasks, you get these in your logs, they might be 
> omitted
> Transactions errors and anomalies: these used to happen a lot, but I 
> switched to a pull queue based logic to prevent them, now they are replaced 
> by pull queue anomalies
>
> (It would have been great if limits and capacities of each service was 
> more transparent, and I really think taskqueues need some eased bucket 
> configurations, things that will help task batches to be executed in an 
> eased manner, currently the only way to achieve this is to put flat and low 
> throughput limits - similarly, same kind of control can be achieved on the 
> instance scheduler level)
>
> ------
>
> Also, after 5 years, I gave up on app engine support, during a time we 
> used to get actual support from this google groups, currently it's just 
> random initial replies and no follow ups, so unless you are paying $500 or 
> something monthly for support, don't expect much support, you are on your 
> own to detect the issues and prevent them through experimentation and 
> volunteer help
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/241c324c-9bcf-4428-bb7b-e75727f90fe1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] Re: My Best App Engine Advice Would Be: Throttle Well

Reply via email to