Re: [google-appengine] Re: (mostly) Consistent 20-second delay in starting backend tasks

Robert Kluin Sun, 05 Feb 2012 22:56:49 -0800

Does the app get a lot of front-end traffic or is it sitting idle when
the delays occur?





On Mon, Feb 6, 2012 at 01:38, Carter Maslan <[email protected]> wrote:
> I Just looked at the last 80 that ran.
> That queue's tasks are running in between 19ms and 2486ms with most of them 
> running around 28ms.  The variability relates to the number of quadtree 
> searches needed, but other queues that experience similar delays have running 
> time without much variation(e.g. predictable counter updates)
> When the delays happen, there just aren't many tasks in the queue at all.
> It appears that the delayed tasks are just sitting in the queue idle.
>
>
>
> On Feb 5, 2012, at 9:17 PM, Robert Kluin <[email protected]> wrote:
>
>> That's interesting.  Did the queue sit there for a long time not
>> running anything, or running tasks very slowly?  Are the tasks in that
>> queue generally long-running?
>>
>> I _very_ infrequently bump into that type of issue, but I periodically
>> will see one queue slow down for a while.  It *seems* to happen far
>> more often in queues with slower tasks, but I don't have any recent
>> empirical evidence of that.  And I *think* I've been told that should
>> not be the case.
>>
>>
>> Robert
>>
>>
>>
>> On Sun, Feb 5, 2012 at 19:27, Carter Maslan <[email protected]> wrote:
>>> Nicholas -
>>>
>>> For our examples of the 10-20 minute delay:
>>> app_id=s~camiologger
>>> queue=image-label
>>> (but several other queues experience the same long delays sometimes:
>>> content-process, counter-update, etc...)
>>>
>>> The tasks were not added with transactions; just this code:
>>> Queue queueP =
>>> QueueFactory.getQueue(ServerUtils.QUEUE_NAME_IMAGE_LABEL_PUSH);
>>> TaskHandle th = queueP.add(withUrl(ServerUtils.PATH_ADMIN_MOTION_LABEL)
>>>
>>> .param("key", contentKeyString)
>>>
>>> .method(TaskOptions.Method.GET));
>>>
>>>
>>> Let me know if you need more info.  We noticed this in the last few weeks.
>>> Carter
>>>
>>>
>>>
>>> On Sun, Feb 5, 2012 at 4:05 PM, Dave Loomer <[email protected]> wrote:
>>>>
>>>> As the OP you may be interested in my app ID as well: mn-live.  I
>>>> provided some logs a few posts back and some exhaustive details at the
>>>> beginning.
>>>>
>>>> However, you won't see this issue popping up anymore on my app since I
>>>> "solved" it by setting countdown=1 a week ago. Since then, tasks start
>>>> very reliably after a 1.5 second delay.  If I remove the countdown
>>>> parameter, then it returns to 20 seconds (+/- .01) pretty reliably.
>>>>
>>>> On Feb 5, 5:59 pm, Nicholas Verne <[email protected]> wrote:
>>>>> We would have no need to shoot anyone.
>>>>>
>>>>> However, the explanations quickly become obsolete. They enter the
>>>>> folklore in the form that was current at the time and become
>>>>> entrenched as incorrect information when the implementations have
>>>>> changed.
>>>>>
>>>>> Task Queues use best effort scheduling. They're not real time all the
>>>>> time, although when our best efforts are running smoothly they can
>>>>> appear real time. For scheduling, the task eta marks the earliest time
>>>>> at which the task can run. We can't guarantee that a task WILL run at
>>>>> that time.
>>>>>
>>>>> Steve, we're interested to know about the 10-20 minute delays you've
>>>>> seen. Can you tell us the app id, queue, and whether the tasks were
>>>>> added transactionally? An example from your logs would be very
>>>>> helpful.
>>>>>
>>>>> Nick Verne
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 6, 2012 at 9:27 AM, stevep <[email protected]> wrote:
>>>>>> Carter wrote: We regularly but erratically see 10-20 minute delays in
>>>>>> running push queue tasks.
>>>>>
>>>>>> Been a burr under the saddle forever. What I really don't understand
>>>>>> -- assuming GAE engineers never see the benefit of providing at least
>>>>>> one priority/reliability queue -- is why the heck there is never any
>>>>>> explanation about how tasks get scheduled, and why these weird delays
>>>>>> happen. It is either: 1) If we told you we would have to shoot you, or
>>>>>> 2) We can't see the benefit of you understanding this.
>>>>>
>>>>>> -stevep
>>>>>
>>>>>> On Feb 5, 9:24 am, Carter <[email protected]> wrote:
>>>>>>> We regularly but erratically see 10-20 minute delays in running push
>>>>>>> queue tasks.
>>>>>>> The tasks sit in the queue with ETA as high as 20 minutes *ago*
>>>>>>> without any errors or retries.
>>>>>
>>>>>>> (the problem seems unrelated to queue settings since our Maximum
>>>>>>> Rate,
>>>>>>> Enorced Rate and Maximum Concurrent all far exceed the queue's
>>>>>>> throughput at the time of the delays)
>>>>>
>>>>>>> Any tips or clues on how to prevent this while still using push
>>>>>>> queues
>>>>>>> without backends?
>>>>>
>>>>>>> On Feb 1, 9:03 pm, Robert Kluin <[email protected]> wrote:
>>>>>
>>>>>>>> Hey Dave,
>>>>>>>>   Hopefully Nick will be able to offer some insight into the cause
>>>>>>>> of
>>>>>>>> your issues.  I'd guess it is something related to having very few
>>>>>>>> tasks (one) in thequeue, and it not getting scheduled rapidly.
>>>>>
>>>>>>>>   In your case, you could use pull queues to immediately fetch the
>>>>>>>> nexttaskwhen finished with atask.  Or even to fetch multiple tasks
>>>>>>>> and do the work in parallel.  Basically you'd have a backend that
>>>>>>>> ran
>>>>>>>> a loop (possibly initiated via a pushtask) that would lease atask,
>>>>>>>> or tasks, from the pullqueue, do the work, delete those tasks, then
>>>>>>>> repeat from the lease stage.  The cool thing is that if you're, for
>>>>>>>> example, using URL Fetch to pull data  this might let you do it in
>>>>>>>> parallel without increasing your costs much (if any).
>>>>>
>>>>>>>> Robert
>>>>>
>>>>>>>> On Wed, Feb 1, 2012 at 14:25, Dave Loomer <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> Here are logs from three consecutivetaskexecutions over the past
>>>>>>>>> weekend,
>>>>>>>>> with only identifying information removed. You'll see that
>>>>>>>>> eachtask
>>>>>>>>> completes in a few milliseconds, but are 20 seconds apart
>>>>>>>>> (remember: I've
>>>>>>>>> already checked myqueueconfigurations, nothing else is running on
>>>>>>>>> this
>>>>>>>>> backend, and I later solved the problem by setting countdown=1
>>>>>>>>> when adding
>>>>>>>>> thetask).  I don't see any pending latency mentioned.
>>>>>
>>>>>>>>> 0.1.0.2 - - [27/Jan/2012:18:33:20 -0800] 200 124 ms=10 cpu_ms=47
>>>>>>>>> api_cpu_ms=0 cpm_usd=0.000060 queue_name=overnight-tasks
>>>>>>>>> task_name=15804554889304913211 instance=0
>>>>>>>>> 0.1.0.2 - - [27/Jan/2012:18:33:00 -0800] 200 124 ms=11 cpu_ms=0
>>>>>>>>> api_cpu_ms=0
>>>>>>>>> cpm_usd=0.000060 queue_name=overnight-tasks
>>>>>>>>> task_name=15804554889304912461
>>>>>>>>> instance=0
>>>>>>>>> 0.1.0.2 - - [27/Jan/2012:18:32:41 -0800] 200 124 ms=26 cpu_ms=0
>>>>>>>>> api_cpu_ms=0
>>>>>>>>> cpm_usd=0.000060 queue_name=overnight-tasks
>>>>>>>>> task_name=4499136807998063691
>>>>>>>>> instance=0
>>>>>
>>>>>>>>> The 20 seconds seems to happen regardless of length oftask. Even
>>>>>>>>> though my
>>>>>>>>> tasks mostly complete in a couple minutes, I do have cases where
>>>>>>>>> they take
>>>>>>>>> several minutes, and I don't see a difference. Of course, when
>>>>>>>>> atasktakes
>>>>>>>>> 5-10 minutes to complete, I'm going to notice and care about a
>>>>>>>>> 20-second
>>>>>>>>> delaymuch less than when I'm trying to spin through a few tasks in
>>>>>>>>> a minute
>>>>>>>>> (which is a real-world need for me as well).
>>>>>
>>>>>>>>> When reading up on pull queues a while back, I was a little
>>>>>>>>> confused about
>>>>>>>>> where I would use them with my own backends. I definitely could
>>>>>>>>> see an
>>>>>>>>> application for offloading work to an AWS Linux instance. But in
>>>>>>>>> either
>>>>>>>>> case, could you explain why it might help?
>>>>>
>>>>>>>>> I saw you mention in a separate thread how M/S can perform
>>>>>>>>> differently from
>>>>>>>>> HRD even in cases where one wouldn't expect to see a difference.
>>>>>>>>> When I get
>>>>>>>>> around to it I'm going to create a tiny HRD app and run the same
>>>>>>>>> tests
>>>>>>>>> through that.
>>>>>
>>>>>>>>> I also wonder if M/S could be responsible for frequent latencies
>>>>>>>>> in my admin
>>>>>>>>> console. Those have gotten more frequent and annoying the past
>>>>>>>>> couple of
>>>>>>>>> months ...
>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the
>>>>>>>>> Google Groups
>>>>>>>>> "Google App Engine" group.
>>>>>>>>> To view this discussion on the web visit
>>>>>>>>> https://groups.google.com/d/msg/google-appengine/-/lbNQRQdSx0AJ.
>>>>>
>>>>>>>>> To post to this group, send email to
>>>>>>>>> [email protected].
>>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>>> [email protected].
>>>>>>>>> For more options, visit this group at
>>>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Google App Engine" group.
>>>>>> To post to this group, send email to
>>>>>> [email protected].
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected].
>>>>>> For more options, visit this group
>>>>>> athttp://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups
>>>> "Google App Engine" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>
>> --
>> You received this message because you are subscribed to the Google Groups 
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to 
>> [email protected].
>> For more options, visit this group at 
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Re: (mostly) Consistent 20-second delay in starting backend tasks

Reply via email to