Re: [google-appengine] Re: Idempotence & multiple task execution

Eli Jones Thu, 09 Sep 2010 09:41:25 -0700

How did I determine concurrent execution?

I determined that I had concurrent task execution because you can see the
task_name in the logs, and a named task successfully ran twice.  And, the
one that ran last threw a TaskAlreadyExists error when trying to add the
next chained task to the queue since each named task has a specifically
defined name for the next task in the chain and the version that finished
first had already added the next named task to the queue. (This is why it is
absolutely important to use named tasks when chaining.. some sort of random
error can fork your tasks).


Why do I suggest tasks do not just retry immediately (or in less than 30
seconds after failure).. and have done so in the time before your April 23rd
e-mail.

Here are some logs showing a task retry on Feb 22nd (it's hard to find many
examples since Appengine Logs only keep error logs after a few months.. so I
need to find two errors in a row for a task to see the retry).

The task's first run was at 12:20:00.026 PM.  It ran for 29 seconds and
failed at 12:20:29.275 PM with Deadline Exceeded.. then it retried at
12:21:07.596 PM (37 seconds after failure):

02-22 12:21PM 07.596 /myTask 500 28548ms 306cpu_ms 160api_cpu_ms 2kb
AppEngine-Google; (+http://code.google.com/appengine)
E 02-22 12:21PM 36.140 <class
'google.appengine.runtime.DeadlineExceededError'>: Traceback (most recent
call last): File "/base/data/home/apps/myApp/1.34005759049070

02-22 12:20PM 00.026 /myTask 500 29255ms 2777cpu_ms 193api_cpu_ms 2kb
AppEngine-Google; (+http://code.google.com/appengine)
E 02-22 12:20PM 29.275 <class
'google.appengine.runtime.DeadlineExceededError'>: Traceback (most recent
call last): File "/base/data/home/apps/myApp/1.34005759049070

The general behaviour for my app is more like.. the task will fail, and then
it will retry in 120 seconds (I have error logs showing this occurring back
in February as well.)

Maybe non-named tasks that are set to run immediately have retried on a
different timeframe in the past.. but the retry time has not just been some
generic sub-30 second time.

As for Ikai's comment, it says what it says: "The same task should not be
executed multiple times concurrently."

It does not say that the same task cannot be executed multiple times
concurrently.

Again, my money is on the reality that one cannot guarantee 100% that an
error will never occur that could lead to concurrent task execution... you
would cripple the task queu subsytem if you put in a bunch of preventative
checks.  Though, one can state with reasonable confidence that it is highly
improbable that a task will execute concurrently.  But, good luck getting a
literal answer to your question.


On Thu, Sep 9, 2010 at 5:26 AM, hawkett <[email protected]> wrote:

> Hi Eli, notes below -
>
> On Sep 8, 4:14 pm, Eli Jones <[email protected]> wrote:
> > Well, I've been doing named, chained tasks since November 2009, and I can
> > point out three things:
> >
>
> Task names aren't especially relevant to the question - names stop the
> same task being raised twice, not executed twice. I have been using
> the task queue since it was released, and definitely noticed tasks
> being executed more than once, but never concurrently.
>
> > 1.  I've had concurrent tasks execute at least once (that I noticed) when
> > only one was supposed to run.. And, this appeared to happen when the
> > subsystem first fired off the task (after it had already been added to
> the
> > queue.. since TombstonedTaskError and TaskAlreadyExistsError seem to work
> > nicely.).
> >
>
> Well, from Ikai's comment it would sound like google does not expect
> this behaviour. I raised this thread through hypothetical analysis of
> the technology, but if you have seen it happen, then that is
> especially interesting. I personally can't see how it could
> legitimately happen if it backs off for more than 30s - it would be a
> bug in the system for the task to fire duplicates when it is first
> raised, IMO. How did you determine the execution was concurrent?
>
> > 2.  The GAE doc that I linked to explicitly states "it is possible in
> > exceptional circumstances that a Task may execute multiple times".  I
> > believe that this covers both cases of the same task running concurrently
> or
> > sequentially.
>
> I don't think it does, but this is specifically the point of this
> thread - it is not clear. I don't want to engineer significant
> overhead into my application based on interpretation of unclear
> documentation. To me, the same task id executing at the same time in
> app engine, if it is possible, is something that needs to be
> explicitly documented, because it has significant impact on app
> architecture. Again, Ikai's comment above seems to imply Google does
> not expect this to happen. So if the documentation is unclear, and
> google seems to suggest the opposite of your interpretation, that's a
> good reason to be wary of the assumption you are making.
>
> >
> > 3.  For my failed tasks, I'm pretty sure the backoff has always been more
> > than 30 seconds (if the task failed in the middle of running).
>  Generally,
> > if a task failed in the middle of running, it would run again 60 seconds
> -
> > 120 seconds later.
> >
>
> It hasn't. Absolutely, definitely used to retry immediately and back
> off at incrementally larger intervals that were initially < 30s.
> Worked like this for quite a long while. Indeed, people other than me
> suggested this behaviour should be changed to 30s plus to deal with
> the issue in this thread. I had many, many situations where I had a
> bug in a task, and the work it generated straight after failure would
> fill up the error logs almost instantly. It was a real hassle for a
> while there, and one of the reasons why I raised this issue in June
> last year - http://code.google.com/p/googleappengine/issues/detail?id=1771
> (among a bunch of others). I wouldn't have suggested backoff should be
> changed to > 30s if it was already the case.
>
> > I can see how one would like the doc to explicitly address the potential
> for
> > concurrent execution.. but you should presume that it is possible since
> the
> > doc infers it.. and the doc doesn't say it can't happen.. and (less
> > importantly) some guy on an internet news group is telling you that it
> has
> > occurred in the past.
> >
>
> I don't think the docs infer it. I think it is ambiguous, especially
> in relation to Ikai's comment.
>
> > I personally cannot imagine how one could guarantee that this would never
> > happen without bogging down the entire taskqueue subsystem with triple
> and
> > quadruple checks and adding in random (1-3 second) wait times for exactly
> > when any task would execute.. (but, I have a limited imagination).. and
> it
> > seems like even then.. you cannot guarantee 100% that a task would not
> > execute twice at once if a drastic system error occurred.
>
> Executing twice is fine, I get that. Executing the same task id
> concurrently seems to be something that can be avoided - I don't see
> anything other than the 30s+ backoff being required to achieve this.
> Maybe that's wrong, but its sufficient for me, and was the suggestion
> I made to address it. Unless someone highlights another reason why it
> could occur, I'm glad to avoid the additional architecture.
>
> >
> > On Wed, Sep 8, 2010 at 4:18 AM, hawkett <[email protected]> wrote:
> > > Hi Eli,
> >
> > > Thanks for the info - the question was definitely trying to get a
> > > specific statement about whether app engine could run the same task id
> > > at the same time. Ikai's post seems to suggest that google did not
> > > think this is possible, but did not seem to address the failure
> > > scenarios I outlined.
> >
> > > It was about the time that I queried Ikai'a response that re-executed
> > > tasks started backing off for a significant period (over 30s) - they
> > > used to go immediately, and then get slower and slower. e.g. 1s, 2s,
> > > 4s, 8s type behaviour. Probably co-incidence, but the fact it started
> > > happening meant that I chose to assume that concurrent tasks with the
> > > same id could not occur. As you can see in the above thread, I had
> > > suggested backing off for more than 30s as a solution.
> >
> > > I agree that the problem is making sure you know how idempotent your
> > > operations need to be, which is specifically why it is important to
> > > have a definitive statement from google as to whether this the
> > > concurrent execution can occur or not. Without that information, I
> > > don't know how idempotent my operations need to be. Without this
> > > information, I should probably be assuming concurrent execution *can*
> > > occur, but I'm taking a risk because the overhead is so high (in my
> > > application).
> >
> > > So from my perspective, it would be a reasonable courtesy for google
> > > to comment on this thread - it is a reasonable question with some fair
> > > effort spent on articulating it, and it appears they may have fixed it
> > > in response to this thread without taking the time to say so.
> >
> > > Thanks,
> >
> > > Colin
> >
> > > On Sep 7, 5:04 pm, Eli Jones <[email protected]> wrote:
> > > > Just in case anyone comes across this thread and is wondering about
> the
> > > > potential for concurrent execution of a named task.
> >
> > > > This is documented:
> >
> > > >http://code.google.com/appengine/docs/python/taskqueue/overview.html
> >
> > > > <
> http://code.google.com/appengine/docs/python/taskqueue/overview.html
> > > >The
> > > > important quote is:
> >
> > > > "When implementing the code for Tasks (as worker URLs within your
> app),
> > > it
> > > > is important that you consider whether the task is idempotent. App
> > > Engine's
> > > > Task Queue API is designed to only invoke a given task once, however
> it
> > > is
> > > > possible in exceptional circumstances that a Task may execute
> multiple
> > > times
> > > > (e.g. in the unlikely case of major system failure). Thus, your code
> must
> > > > ensure that there are no harmful side-effects of repeated execution."
> >
> > > > So.. again, a named task should not run more than once.. and probably
> > > will
> > > > not run more than once.. But, there could be a major system failure
> that
> > > > might result in the named task running more than once.
> >
> > > > The "concurrent execution" problem should only come up if an error
> occurs
> > > in
> > > > the system at the moment the task is executed.. and somehow two
> versions
> > > are
> > > > started at the same time.
> >
> > > > I don't know that this issue would/could come up for failed tasks
> that
> > > are
> > > > then re-executed.  (I guess there could be an error that somehow
> > > indicates
> > > > the task has failed when it really is still running... and thus the
> > > > re-executed task begins while the old task is still running.)  But,
> > > > re-executed tasks already seem to start well over 30 seconds after
> the
> > > > purported failed task has finished.
> >
> > > > So.. you need to figure out how idempotent you need your tasks to
> be.. no
> > > > matter what.. there is no way to guarantee that a large,
> geographically
> > > > distributed system like this is 110% exact at all moments.. and
> assuming
> > > (or
> > > > requesting) that there is no way an exception can happen that might
> > > result
> > > > in concurrent task execution is the wrong approach.
> >
> > > > For my chained tasks.. I just relax my requirements and have named
> tasks
> > > > that insert, update based on key_name.. and if two happen to run
> > > > concurrently... I just get the data from the most recent insert,
> update..
> > > > since earlier insert, updates get overwritten, and life goes on.
> >
> > > > On Fri, May 28, 2010 at 7:50 PM, hawkett <[email protected]> wrote:
> > > > > Just my weekly bump on this thread. The advice from google appears
> to
> > > > > be to trust that tasks with the same id cannot be running
> > > > > concurrently. However, there are clear edge scenarios documented in
> > > > > this thread that are not accounted for. It would be a pity if
> people
> > > > > made architectural decisions based on the advice from google, and
> > > > > discovered down the track that their data was corrupted as a result
> of
> > > > > the occasional concurrent execution of the same task id. Are the
> edge
> > > > > cases handled, and tasks *never* run concurrently, or is it only
> the
> > > > > case that they don't run concurrently 'under normal conditions'?
>  If
> > > > > there could ever be concurrent execution then it is a whole
> different
> > > > > architectural scenario. Can it happen or not? By all means, if the
> > > > > answer is that task queue is an experimental feature, 'anything's
> > > > > possible', that would be better than tumbleweed, and infinitely
> better
> > > > > than advising that concurrent execution cannot occur, when in fact
> > > > > you're not sure that's true. Thanks,
> >
> > > > > Colin
> >
> > > > > On May 22, 9:46 am, hawkett <[email protected]> wrote:
> > > > > > Apologies for repeatedly bumping this thread, but the advice
> seems to
> > > > > > be that the same task-id *cannot* execute concurrently (100%
> > > > > > guaranteed), but no response asserting this has addressed the
> failure
> > > > > > scenario I've raised, where it would appear that the same task
> *may*
> > > > > > execute concurrently unless app engine has implemented something
> > > > > > specifically to prevent it occurring.  I know the task queue is
> very
> > > > > > reliable, but not 100% so -
> > > > >
> http://groups.google.com/group/google-appengine/browse_thread/thread/..
> > > ..
> >
> > > > > > So - in the scenario where the HTTP client (i.e. the task queue)
> > > drops
> > > > > > the HTTP connection in an initial task execution - how does app
> > > engine
> > > > > > prevent the recovery mechanism from executing the task a second
> time
> > > > > > while the first is still running?
> >
> > > > > > The possibility of the same task running concurrently has
> significant
> > > > > > architectural implications for my app.  Does app engine handle
> the
> > > > > > scenario I've outlined and prevent concurrent execution of the
> same
> > > > > > task-id?
> >
> > > > > > Thanks for the clarification,
> >
> > > > > > Colin
> >
> > > > > > On May 13, 5:35 pm, "Ikai L (Google)" <[email protected]> wrote:
> >
> > > > > > > The same task should not be executed multiple times
> concurrently.
> > > If it
> > > > > > > fails, we will retry it in the future (could be back to back,
> but
> > > this
> > > > > is
> > > > > > > not guaranteed).
> >
> > > > > > > Are you seeing evidence of the contrary?
> >
> > > > > > > On Wed, May 12, 2010 at 12:49 PM, hawkett <[email protected]>
> > > wrote:
> > > > > > > > Bump - still not clear whether the same task can be executing
> > > > > multiple
> > > > > > > > times concurrently? I noticed that failed tasks seem to back
> off
> > > for
> > > > > > > > significantly longer recently - perhaps this has helped the
> > > > > situation?
> > > > > > > > Appreciate any clarification - cheers,
> >
> > > > > > > > Colin
> >
> > > > > > > > On May 1, 1:08 am, hawkett <[email protected]> wrote:
> > > > > > > > > My use case is as follows -
> >
> > > > > > > > > 1. tasks which do not support idempotence inherently (such
> as
> > > > > deletes,
> > > > > > > > > and some puts) carry a unique identifier, which is written
> as a
> > > > > > > > > receipt in an attribute of an entity that is updated in the
> > > > > > > > > transaction.
> > > > > > > > > 2. When a task arrives carrying a receipt, I check that it
> does
> > > not
> > > > > > > > > already exist - so receipted tasks incur an additional, key
> > > only,
> > > > > db
> > > > > > > > > read
> >
> > > > > > > > > This is essentially my algorithm for ensuring idempotence
> (in
> > > > > > > > > situations where it is not inherent) - ignore subsequent
> > > > > executions.
> >
> > > > > > > > > If the same task *cannot* be running in parallel, then the
> > > check
> > > > > for
> > > > > > > > > the receipt can be done outside the transaction that writes
> the
> > > > > > > > > receipt - which has a couple of advantages -
> >
> > > > > > > > > a. It can be done up front in the task handler, so I don't
> have
> > > to
> > > > > go
> > > > > > > > > all the way through to the...
> >
> > read more »
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Idempotence & multiple task execution

Reply via email to