Hey Kaan,
I'll echo what Jesse has said about the new efforts in place to provide
closer work between the community of developers and Cloud Platform Support,
and I look forward to the good discussions that can be had here, as well as
working together on stackoverflow and the public issue tracker to make the
best use of those forums.
Thanks for taking the time to bring up some issues you've been seeing. In
regards to each of these issues, I'll enumerate them from one to four,
according to the order they appeared in your post. I'll discuss my
impression of what the issue may be, or what information is missing in
order to make a good issue report. I'll also generally comment with some
advice on where to move next in getting some support eyes on any potential
issues.
1. "Request was aborted after waiting too long to attempt to service
your request"
- If you've observed log lines with this error appearing when you have a
number of tasks in-queue which seem to overload the processing power of
your available instances, this may indicate a platform issue or it may
also
indicate an issue in your own app's config/code, although it's not
possible
to tell without more details, such as the following:
* the .yaml/.xml config files (mostly the scaling settings are of
interest)
* a brief description of what the system was doing, tending to
prefer code snippets, numbers, code, and logs to brief informal verbal
description
* a time-frame and name of an affected instance
- With such details, an adequate issue report can be created and
dealt with in the public issue tracker
<https://code.google.com/p/googleappengine/issues/list>, or a valid stack
overflow question
<http://stackoverflow.com/questions/tagged/google-app-engine> can be
created, depending on whether you perceive it to be a platform or user
code
issue.
2. "google.appengine.api.taskqueue.taskqueue.TransientError"
- As documented here
<https://cloud.google.com/appengine/docs/python/taskqueue/overview-pull#Python_Leasing_tasks>,
it's possible this can happen when using Pull queues. This can be, as you
correctly observe, related to rate-limiting in the infrastructure,
although
you feel the details of how rates are set are not sufficiently
documented.
It's likely that this derives from attempting to lease_tasks() from the
queue too often, but it's true that we can't be sure.
- I definitely understand you here and encourage you to create a
public issue tracker thread which can be starred by other users to
demonstrate an interest in more detailed documentation around this limit.
I
- In the meantime, where we still need to be able to handle these
errors on a platform which does allow you to scale up aggressively, in
the
context of a data-center (network) with shared but well-isolated and
ample
resources, error-responses such as these will occur periodically. A
well-scalable app can ride out transient errors and rate-limiting with a
small application of exponential back-off, non-spiking, etc. I encourage
you to take the advice of the docs and attempt to rate-limit when you see
this error, as it's likely the lease_rate() per queue is too fast.
- If you find that a behaviour still appears anomalous to you - that
is to say if a behaviour of the system seems out of sync with the
documented behaviour - then you should open a public issue tracker
<https://code.google.com/p/googleappengine/issues/list> issue with
sufficient information to allow investigation. If the issue report
contains
sufficient information, it will be likely to produce a positive result,
and
quickly.
3. "DeadlineExceededError"
- This issue can also occur by the same cause as for 2., and it's worth
investigating. My advice again is to create a public issue tracker issue
as
soon as you notice something that you perceive to be anomalous about the
behaviour of any App Engine system.
4. "push/pull queue anomalies"
- I'm unsure what you mean by this, although as above, if you feel
there's an issue on the platform, I want to encourage you to report it
adequately, as we're here and happy to
So, to conclude, once each of these issues you bring up can be investigated
along with the documented behaviour and, if necessary, can be developed
into a proper issue report for the platform, the public issue tracker issue
you create will be picked up and brought to the attention of platform
developers / engineers / support. If, rather than a platform issue, it
looks like the issue is related to your specific use of the services on the
platform, you should rather create a stackoverflow question on the related
tags, to get support in that form.
Finally, to address what you say in parentheses before the end of "------",
it's definitely possible to implement easing and rate-limiting on pull
queue task execution, since the frequency of task lease/execution is
tune-able in whatever timing logic you set up.
For push queues, to implement easing, you can define a stepped gradient of
queues with different configured processing rates, bucket sizes, etc.
<https://cloud.google.com/appengine/docs/java/config/queue#Defining_Push_Queues_and_Processing_Rates>,
and have the task-adding logic read the current state of fullness in the
various queues (you can store information about the queue fullness/rate,
etc. in Memcache or Datastore, or just use API calls to the Task Queue
API), possibly along with API calls to get the number of instances in the
handler module
<https://cloud.google.com/appengine/docs/python/modules/functions#get_num_instances>,
to determine which queue to step up to / include in the rotation of queues
which receive tasks (your discretion) when adding tasks with given
payloads, etc.
Using the basic building blocks, some complex timing logic can be
implemented, and if you feel that you'd like to make a feature request such
as "provide easing parameter in queue configuration", describing how it
works, the place for feature requests is the public issue tracker
<https://code.google.com/p/googleappengine/issues/list>.
I hope you've come away from this feeling heard, and with a better
understanding of where and how to get support with any issues you may
encounter. I tried to address each of the issues you brought up to make
sure you get useful information.
Have a great day!
- Nick
On Tuesday, May 26, 2015 at 4:57:06 AM UTC-4, Kaan Soral wrote:
>
> I've been using App Engine for probably something like 5 years, I have one
> major app that has been running for 5 years, it's very well polished, and
> the traffic and behaviour of the app is very predictable *knocking on wood*
> I have another app that I've been working on for 3 years, it didn't take
> off yet, the new app is unpredictable in behaviour, it's vast and
> unthrottled
>
> While the old app has been handling millions of requests without errors
> and issues, the new one is failing on even the simplest tasks, the logs are
> filled with TransientError's, instance failures, transaction failures, the
> whole thing is chaotic
>
> The old app has throttled queues and basic tasks, the throughput is well
> calibrated to complete all the tasks in 5 hours, using optimal amount of
> instances, the traffic is regular, it eases in and eases out throughout the
> day (without throttling, the old app was in similar state before)
> The new app is built to perform, so it's queues have no limits, it trusts
> App Engine to scale as much as it can
>
> Well turns out that trust isn't well placed, App Engine is supposed to
> scale on it's own, yet when you leave the limits to the App Engine, it
> fails to perform
> You might ask: "Why would I use App Engine if I'm going to manually scale
> the limits myself?" - That's a good question, If you're going to have to
> adjust the limits and throughput manually while your app grows, you might
> as well use AWS or a similar more reliable service
>
> This is mostly a rant post, but the advice is still solid, one has to
> manually calibrate the throughput of routines to prevent app spikes, the
> instance births and deaths should always be eased in and eased out,
> otherwise various services of app engine fail to perform
> On the bright side, throttling also reduces the costs significantly, so
> it's a good idea to always keep an eye on the app and manually calibrate
> all routines - on the other side, if your app gains additional traffic
> without your supervision, these routines will hog and halt
>
> ------
>
> On a more technical side, some of these errors are:
> "Request was aborted after waiting too long to attempt to service your
> request." - they come in 100's - flood the logs - these are taskqueue
> push tasks, so the error is pretty stupid, if they can't be handled, they
> should be left in the queue
> "google.appengine.api.taskqueue.taskqueue.TransientError" - these are
> from pull queues, there are invisible/untold limits of pull queues, this is
> also very concerning, because if your app grows, your scale might be bound
> by these limits, so try not to use pull queues too much
> "DeadlineExceededError's" - these are pretty random and rare, yet when
> you run thousands of tasks, you get these in your logs, they might be
> omitted
> Transactions errors and anomalies: these used to happen a lot, but I
> switched to a pull queue based logic to prevent them, now they are replaced
> by pull queue anomalies
>
> (It would have been great if limits and capacities of each service was
> more transparent, and I really think taskqueues need some eased bucket
> configurations, things that will help task batches to be executed in an
> eased manner, currently the only way to achieve this is to put flat and low
> throughput limits - similarly, same kind of control can be achieved on the
> instance scheduler level)
>
> ------
>
> Also, after 5 years, I gave up on app engine support, during a time we
> used to get actual support from this google groups, currently it's just
> random initial replies and no follow ups, so unless you are paying $500 or
> something monthly for support, don't expect much support, you are on your
> own to detect the issues and prevent them through experimentation and
> volunteer help
>
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-appengine/241c324c-9bcf-4428-bb7b-e75727f90fe1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.