[google-appengine] Re: (mostly) Consistent 20-second delay in starting backend tasks

stevep Mon, 06 Feb 2012 07:08:55 -0800

Thanks Nick. Been on the forums for years, and that's the first I
remember having a GAE engineer delve into TQs at this level. (BTW
Carter replied already about his 20 minute starts).


Maybe over time TQ scheduling might expand to enhance multi-instance
optimization, and provide a means for a priority queue (with
constraints such as volume capacity).

-stevep

On Feb 5, 3:59 pm, Nicholas Verne <[email protected]> wrote:
> We would have no need to shoot anyone.
>
> However, the explanations quickly become obsolete. They enter the
> folklore in the form that was current at the time and become
> entrenched as incorrect information when the implementations have
> changed.
>
> Task Queues use best effort scheduling. They're not real time all the
> time, although when our best efforts are running smoothly they can
> appear real time. For scheduling, the task eta marks the earliest time
> at which the task can run. We can't guarantee that a task WILL run at
> that time.
>
> Steve, we're interested to know about the 10-20 minute delays you've
> seen. Can you tell us the app id, queue, and whether the tasks were
> added transactionally? An example from your logs would be very
> helpful.
>
> Nick Verne
>
>
>
>
>
>
>
> On Mon, Feb 6, 2012 at 9:27 AM, stevep <[email protected]> wrote:
> > Carter wrote: We regularly but erratically see 10-20 minute delays in
> > running push queue tasks.
>
> > Been a burr under the saddle forever. What I really don't understand
> > -- assuming GAE engineers never see the benefit of providing at least
> > one priority/reliability queue -- is why the heck there is never any
> > explanation about how tasks get scheduled, and why these weird delays
> > happen. It is either: 1) If we told you we would have to shoot you, or
> > 2) We can't see the benefit of you understanding this.
>
> > -stevep
>
> > On Feb 5, 9:24 am, Carter <[email protected]> wrote:
> >> We regularly but erratically see 10-20 minute delays in running push
> >> queue tasks.
> >> The tasks sit in the queue with ETA as high as 20 minutes *ago*
> >> without any errors or retries.
>
> >> (the problem seems unrelated to queue settings since our Maximum Rate,
> >> Enorced Rate and Maximum Concurrent all far exceed the queue's
> >> throughput at the time of the delays)
>
> >> Any tips or clues on how to prevent this while still using push queues
> >> without backends?
>
> >> On Feb 1, 9:03 pm, Robert Kluin <[email protected]> wrote:
>
> >> > Hey Dave,
> >> >   Hopefully Nick will be able to offer some insight into the cause of
> >> > your issues.  I'd guess it is something related to having very few
> >> > tasks (one) in thequeue, and it not getting scheduled rapidly.
>
> >> >   In your case, you could use pull queues to immediately fetch the
> >> > nexttaskwhen finished with atask.  Or even to fetch multiple tasks
> >> > and do the work in parallel.  Basically you'd have a backend that ran
> >> > a loop (possibly initiated via a pushtask) that would lease atask,
> >> > or tasks, from the pullqueue, do the work, delete those tasks, then
> >> > repeat from the lease stage.  The cool thing is that if you're, for
> >> > example, using URL Fetch to pull data  this might let you do it in
> >> > parallel without increasing your costs much (if any).
>
> >> > Robert
>
> >> > On Wed, Feb 1, 2012 at 14:25, Dave Loomer <[email protected]> wrote:
> >> > > Here are logs from three consecutivetaskexecutions over the past 
> >> > > weekend,
> >> > > with only identifying information removed. You'll see that eachtask
> >> > > completes in a few milliseconds, but are 20 seconds apart (remember: 
> >> > > I've
> >> > > already checked myqueueconfigurations, nothing else is running on this
> >> > > backend, and I later solved the problem by setting countdown=1 when 
> >> > > adding
> >> > > thetask).  I don't see any pending latency mentioned.
>
> >> > > 0.1.0.2 - - [27/Jan/2012:18:33:20 -0800] 200 124 ms=10 cpu_ms=47
> >> > > api_cpu_ms=0 cpm_usd=0.000060 queue_name=overnight-tasks
> >> > > task_name=15804554889304913211 instance=0
> >> > > 0.1.0.2 - - [27/Jan/2012:18:33:00 -0800] 200 124 ms=11 cpu_ms=0 
> >> > > api_cpu_ms=0
> >> > > cpm_usd=0.000060 queue_name=overnight-tasks 
> >> > > task_name=15804554889304912461
> >> > > instance=0
> >> > > 0.1.0.2 - - [27/Jan/2012:18:32:41 -0800] 200 124 ms=26 cpu_ms=0 
> >> > > api_cpu_ms=0
> >> > > cpm_usd=0.000060 queue_name=overnight-tasks 
> >> > > task_name=4499136807998063691
> >> > > instance=0
>
> >> > > The 20 seconds seems to happen regardless of length oftask. Even 
> >> > > though my
> >> > > tasks mostly complete in a couple minutes, I do have cases where they 
> >> > > take
> >> > > several minutes, and I don't see a difference. Of course, when 
> >> > > atasktakes
> >> > > 5-10 minutes to complete, I'm going to notice and care about a 
> >> > > 20-second
> >> > >delaymuch less than when I'm trying to spin through a few tasks in a 
> >> > >minute
> >> > > (which is a real-world need for me as well).
>
> >> > > When reading up on pull queues a while back, I was a little confused 
> >> > > about
> >> > > where I would use them with my own backends. I definitely could see an
> >> > > application for offloading work to an AWS Linux instance. But in either
> >> > > case, could you explain why it might help?
>
> >> > > I saw you mention in a separate thread how M/S can perform differently 
> >> > > from
> >> > > HRD even in cases where one wouldn't expect to see a difference. When 
> >> > > I get
> >> > > around to it I'm going to create a tiny HRD app and run the same tests
> >> > > through that.
>
> >> > > I also wonder if M/S could be responsible for frequent latencies in my 
> >> > > admin
> >> > > console. Those have gotten more frequent and annoying the past couple 
> >> > > of
> >> > > months ...
>
> >> > > --
> >> > > You received this message because you are subscribed to the Google 
> >> > > Groups
> >> > > "Google App Engine" group.
> >> > > To view this discussion on the web visit
> >> > >https://groups.google.com/d/msg/google-appengine/-/lbNQRQdSx0AJ.
>
> >> > > To post to this group, send email to [email protected].
> >> > > To unsubscribe from this group, send email to
> >> > > [email protected].
> >> > > For more options, visit this group at
> >> > >http://groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Google App Engine" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/google-appengine?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

[google-appengine] Re: (mostly) Consistent 20-second delay in starting backend tasks

Reply via email to