Re: [google-appengine] Re: Idempotence & multiple task execution

Eli Jones Wed, 08 Sep 2010 08:14:37 -0700

Well, I've been doing named, chained tasks since November 2009, and I can
point out three things:


1.  I've had concurrent tasks execute at least once (that I noticed) when
only one was supposed to run.. And, this appeared to happen when the
subsystem first fired off the task (after it had already been added to the
queue.. since TombstonedTaskError and TaskAlreadyExistsError seem to work
nicely.).

2.  The GAE doc that I linked to explicitly states "it is possible in
exceptional circumstances that a Task may execute multiple times".  I
believe that this covers both cases of the same task running concurrently or
sequentially.

3.  For my failed tasks, I'm pretty sure the backoff has always been more
than 30 seconds (if the task failed in the middle of running).  Generally,
if a task failed in the middle of running, it would run again 60 seconds -
120 seconds later.

I can see how one would like the doc to explicitly address the potential for
concurrent execution.. but you should presume that it is possible since the
doc infers it.. and the doc doesn't say it can't happen.. and (less
importantly) some guy on an internet news group is telling you that it has
occurred in the past.

I personally cannot imagine how one could guarantee that this would never
happen without bogging down the entire taskqueue subsystem with triple and
quadruple checks and adding in random (1-3 second) wait times for exactly
when any task would execute.. (but, I have a limited imagination).. and it
seems like even then.. you cannot guarantee 100% that a task would not
execute twice at once if a drastic system error occurred.

On Wed, Sep 8, 2010 at 4:18 AM, hawkett <[email protected]> wrote:

> Hi Eli,
>
> Thanks for the info - the question was definitely trying to get a
> specific statement about whether app engine could run the same task id
> at the same time. Ikai's post seems to suggest that google did not
> think this is possible, but did not seem to address the failure
> scenarios I outlined.
>
> It was about the time that I queried Ikai'a response that re-executed
> tasks started backing off for a significant period (over 30s) - they
> used to go immediately, and then get slower and slower. e.g. 1s, 2s,
> 4s, 8s type behaviour. Probably co-incidence, but the fact it started
> happening meant that I chose to assume that concurrent tasks with the
> same id could not occur. As you can see in the above thread, I had
> suggested backing off for more than 30s as a solution.
>
> I agree that the problem is making sure you know how idempotent your
> operations need to be, which is specifically why it is important to
> have a definitive statement from google as to whether this the
> concurrent execution can occur or not. Without that information, I
> don't know how idempotent my operations need to be. Without this
> information, I should probably be assuming concurrent execution *can*
> occur, but I'm taking a risk because the overhead is so high (in my
> application).
>
> So from my perspective, it would be a reasonable courtesy for google
> to comment on this thread - it is a reasonable question with some fair
> effort spent on articulating it, and it appears they may have fixed it
> in response to this thread without taking the time to say so.
>
> Thanks,
>
> Colin
>
> On Sep 7, 5:04 pm, Eli Jones <[email protected]> wrote:
> > Just in case anyone comes across this thread and is wondering about the
> > potential for concurrent execution of a named task.
> >
> > This is documented:
> >
> > http://code.google.com/appengine/docs/python/taskqueue/overview.html
> >
> > <http://code.google.com/appengine/docs/python/taskqueue/overview.html
> >The
> > important quote is:
> >
> > "When implementing the code for Tasks (as worker URLs within your app),
> it
> > is important that you consider whether the task is idempotent. App
> Engine's
> > Task Queue API is designed to only invoke a given task once, however it
> is
> > possible in exceptional circumstances that a Task may execute multiple
> times
> > (e.g. in the unlikely case of major system failure). Thus, your code must
> > ensure that there are no harmful side-effects of repeated execution."
> >
> > So.. again, a named task should not run more than once.. and probably
> will
> > not run more than once.. But, there could be a major system failure that
> > might result in the named task running more than once.
> >
> > The "concurrent execution" problem should only come up if an error occurs
> in
> > the system at the moment the task is executed.. and somehow two versions
> are
> > started at the same time.
> >
> > I don't know that this issue would/could come up for failed tasks that
> are
> > then re-executed.  (I guess there could be an error that somehow
> indicates
> > the task has failed when it really is still running... and thus the
> > re-executed task begins while the old task is still running.)  But,
> > re-executed tasks already seem to start well over 30 seconds after the
> > purported failed task has finished.
> >
> > So.. you need to figure out how idempotent you need your tasks to be.. no
> > matter what.. there is no way to guarantee that a large, geographically
> > distributed system like this is 110% exact at all moments.. and assuming
> (or
> > requesting) that there is no way an exception can happen that might
> result
> > in concurrent task execution is the wrong approach.
> >
> > For my chained tasks.. I just relax my requirements and have named tasks
> > that insert, update based on key_name.. and if two happen to run
> > concurrently... I just get the data from the most recent insert, update..
> > since earlier insert, updates get overwritten, and life goes on.
> >
> >
> >
> > On Fri, May 28, 2010 at 7:50 PM, hawkett <[email protected]> wrote:
> > > Just my weekly bump on this thread. The advice from google appears to
> > > be to trust that tasks with the same id cannot be running
> > > concurrently. However, there are clear edge scenarios documented in
> > > this thread that are not accounted for. It would be a pity if people
> > > made architectural decisions based on the advice from google, and
> > > discovered down the track that their data was corrupted as a result of
> > > the occasional concurrent execution of the same task id. Are the edge
> > > cases handled, and tasks *never* run concurrently, or is it only the
> > > case that they don't run concurrently 'under normal conditions'?  If
> > > there could ever be concurrent execution then it is a whole different
> > > architectural scenario. Can it happen or not? By all means, if the
> > > answer is that task queue is an experimental feature, 'anything's
> > > possible', that would be better than tumbleweed, and infinitely better
> > > than advising that concurrent execution cannot occur, when in fact
> > > you're not sure that's true. Thanks,
> >
> > > Colin
> >
> > > On May 22, 9:46 am, hawkett <[email protected]> wrote:
> > > > Apologies for repeatedly bumping this thread, but the advice seems to
> > > > be that the same task-id *cannot* execute concurrently (100%
> > > > guaranteed), but no response asserting this has addressed the failure
> > > > scenario I've raised, where it would appear that the same task *may*
> > > > execute concurrently unless app engine has implemented something
> > > > specifically to prevent it occurring.  I know the task queue is very
> > > > reliable, but not 100% so -
> > >http://groups.google.com/group/google-appengine/browse_thread/thread/..
> ..
> >
> > > > So - in the scenario where the HTTP client (i.e. the task queue)
> drops
> > > > the HTTP connection in an initial task execution - how does app
> engine
> > > > prevent the recovery mechanism from executing the task a second time
> > > > while the first is still running?
> >
> > > > The possibility of the same task running concurrently has significant
> > > > architectural implications for my app.  Does app engine handle the
> > > > scenario I've outlined and prevent concurrent execution of the same
> > > > task-id?
> >
> > > > Thanks for the clarification,
> >
> > > > Colin
> >
> > > > On May 13, 5:35 pm, "Ikai L (Google)" <[email protected]> wrote:
> >
> > > > > The same task should not be executed multiple times concurrently.
> If it
> > > > > fails, we will retry it in the future (could be back to back, but
> this
> > > is
> > > > > not guaranteed).
> >
> > > > > Are you seeing evidence of the contrary?
> >
> > > > > On Wed, May 12, 2010 at 12:49 PM, hawkett <[email protected]>
> wrote:
> > > > > > Bump - still not clear whether the same task can be executing
> > > multiple
> > > > > > times concurrently? I noticed that failed tasks seem to back off
> for
> > > > > > significantly longer recently - perhaps this has helped the
> > > situation?
> > > > > > Appreciate any clarification - cheers,
> >
> > > > > > Colin
> >
> > > > > > On May 1, 1:08 am, hawkett <[email protected]> wrote:
> > > > > > > My use case is as follows -
> >
> > > > > > > 1. tasks which do not support idempotence inherently (such as
> > > deletes,
> > > > > > > and some puts) carry a unique identifier, which is written as a
> > > > > > > receipt in an attribute of an entity that is updated in the
> > > > > > > transaction.
> > > > > > > 2. When a task arrives carrying a receipt, I check that it does
> not
> > > > > > > already exist - so receipted tasks incur an additional, key
> only,
> > > db
> > > > > > > read
> >
> > > > > > > This is essentially my algorithm for ensuring idempotence (in
> > > > > > > situations where it is not inherent) - ignore subsequent
> > > executions.
> >
> > > > > > > If the same task *cannot* be running in parallel, then the
> check
> > > for
> > > > > > > the receipt can be done outside the transaction that writes the
> > > > > > > receipt - which has a couple of advantages -
> >
> > > > > > > a. It can be done up front in the task handler, so I don't have
> to
> > > go
> > > > > > > all the way through to the transactional write before
> discovering
> > > it
> > > > > > > already executed
> > > > > > > b. More importantly, I can reduce the work done inside the
> > > transaction
> > > > > > > - every extra millisecond spent in the transaction locks the
> entity
> > > > > > > group, and at scale, those milliseconds can add up - especially
> on
> > > > > > > entity groups that are somewhat write intensive.
> >
> > > > > > > If the same task *can* be running in parallel, then I need to
> do
> > > the
> > > > > > > receipt read inside the transaction that writes it. It would be
> a
> > > pity
> > > > > > > to do that extra work in every transaction for a very rare
> > > scenario.
> >
> > > > > > > As stated earlier, it seems that it might be possible for GAE
> to
> > > > > > > guarantee that it does not execute the same task in parallel -
> by
> > > > > > > ensuring that, for error scenarios like those above (408,
> client
> > > > > > > crash, perhaps others), the 2nd execution waits 30 seconds.
>  That
> > > has
> > > > > > > some obvious downsides, but given how rarely it occurs, and
> given
> > > that
> > > > > > > an app shouldn't be relying on the speed with which a task is
> > > > > > > executed, it seems like a reasonable trade-off to get a
> reduction
> > > in
> > > > > > > transactional work for the vast majority of the time - less
> > > > > > > contention, less CPU, less datastore activity.
> >
> > > > > > > A simple example is a task which increments a counter - we
> don't
> > > want
> > > > > > > to increment the counter twice.
> >
> > > > > > > The problem is the same whether one or many entities are being
> > > updated
> > > > > > > during handling of the task.
> >
> > > > > > > Do you have many situations where you perform a read that does
> not
> > > > > > > result in some sort of update - db update, another task raised,
> > > email
> > > > > > > sent, external system notified etc.?  There's a subset of most
> of
> > > > > > > these that we want to avoid doing twice. It's the multiple
> writes,
> > > > > > > rather than multiple reads causing issues.
> >
> > > > > > > Anyone from google able to end the speculation? :)
> >
> > > > > > > On Apr 30, 2:31 am, Eli Jones <[email protected]> wrote:
> >
> > > > > > > > In my opinion, the case you are asking about is pretty much
> the
> > > reason
> > > > > > they
> > > > > > > > state that tasks must be idempotent.. even with named tasks.
> >
> > > > > > > > They cannot guarantee 100% that some transient error will not
> > > occur
> > > > > > when a
> > > > > > > > scheduled task is executed (even if you are naming tasks and
> are
> > > > > > guaranteed
> > > > > > > > 100% that your task will not be added to the queue more than
> > > once).
> >
> > > > > > > > So, it is possible to have more than one version of the
> "same"
> > > task
> > > > > > > > executing at the same time.  You just need to construct your
> > > tasks so
> > > > > > they
> > > > > > > > aren't doing too much at once (e.g. reading some data, then
> > > updating or
> > > > > > > > inserting.. then reading other data... and updating some
> more),
> > > or you
> > > > > > need
> > > > > > > > to make sure to do all that inside a big transaction.. and,
> even
> > > then,
> > > > > > you
> > > > > > > > still need to ensure idempotence.
> >
> > > > > > > > I sort of prefer a poor man's version of idempotence for my
> > > chained
> > > > > > tasks.
> > > > > > > >  Mainly, if the "same" task runs more than once.. each
> version
> > > will
> > > > > > have a
> > > > > > > > potentially different result, but I am perfectly happy
> getting
> > > the
> > > > > > result
> > > > > > > > from the task that ran last.  But, I can easily accept this
> since
> > > my
> > > > > > tasks
> > > > > > > > are not doing multiple updates at once.. and they are not
> reading
> > > from
> > > > > > the
> > > > > > > > same entities that they are updating.
> >
> > > > > > > > What is your exact use case?
> >
> > > > > > > > On Thu, Apr 29, 2010 at 7:28 PM, hawkett <[email protected]>
> > > wrote:
> > > > > > > > > Thanks for the response - it's good to know that the
> multiple
> > > > > > > > > executions cannot occur in parallel, although I'm not sure
> I
> > > > > > > > > completely...
> >
> > read more »
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<google-appengine%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: Idempotence & multiple task execution

Reply via email to