Re: [google-appengine] Weird Instance Scheduler

Mos Fri, 24 Aug 2012 09:39:58 -0700

Hello Takashi,

> Actually there were almost 8 requests in a second. So App Engine likely
needed more than one instance at this particular moment.


I thought this is why GAE has the concept of "pending-latency"  (which we
discussed below).
Meaning:  Incoming requests may wait up to 15 seconds before starting a new
instance. Therefore when 8 requests in one second occur that
should not mean that more instance needs to be started. Especially if there
is no other traffic in this minute, as seen in my example.
Otherwise it would be a very bad implementation:
Starting a new instance means around 30s waiting time.  Serving 8 parallel
requests from one instance, would result in a maximum of
8 seconds for the last request (assuming that each request takes around 1
second).
There is no reason for this concrete example to fire up more instances and
let requests wait more then 30 seconds until a new instance is loaded.

> ... here is what you've seen in the past weeks.
>
>* You have been almost always set 'Automatic-2' idle instance setting.
>* More than 3 weeks ago, number of loading requests were very few.
> * Recently you have seen more loading requests than before.

That, right!  To be even more concrete: At the 16. august the problems got
significant worse. Please check especially the time area from 16. august
until today.

> First of all, it seems that you deployed 2 new versions on Aug 1 and Aug
2. Can you describe what kind of changes in those versions?

I checked it in our version control. As I wrote no related changes were
made! Just Html/Css stuff:
 * One picture upload
 * One html change
 * One JavaScript change
 * One css change

> And, to be fair, we didn't think of any change in our scheduler around 3
weeks ago which can cause this issue.

And around the 16th august?

> More than 3 weeks before, those 2 idle instances might have had longer
lives than now, but it was not a concrete behavior. Please think this way:
you were just kind of lucky.

That shouldn't be luck! If GAE is not able to start Java instances in 5sec
to 10 second, there needs be a guarantee that instances have longer lives.
Otherwise Java applications on GAE are unusable because user would have a
lot of 30seconds wait time  (--> "failed requests"). (See also next comment
regarding resistant instances)

> If you want some instances always active, please set min idle instances.

I tried this some days ago. I had one resistant instance. But that changed
nothing.  Instances get started and stopped as before. I assumed that
requests would go to the resistant instance first. But that was no the
case. Resistant instance was idle, but a dynamic instance got started and
the request waits 30sec.  Please check other discussion on this list and
issues that reported similar observations.

> As you can see, I'm still not convinced to believe that the scheduler is
misbehaving. I understand that you're having experiences which are bit
worse than 3 weeks ago, and understand your feeling that you want to tell
us 'fix it', but I'd say it's > >still something in the line of 'expected
behavior' at least for now.
> If you feel differently, please let me know.

Yes I do feel differently (please see answers above).

Please accept http://code.google.com/p/googleappengine/issues/detail?id=8004

Thanks
Mos
http://www.mosbase.com


On Fri, Aug 24, 2012 at 4:22 PM, Takashi Matsuo <[email protected]> wrote:

>
> Hi Mos,
>
> On Fri, Aug 24, 2012 at 6:05 PM, Mos <[email protected]> wrote:
>
>> > A possible explanation could be that the traffic pattern had changed.
>>
>> No. It's the same. Check for example the Request/Seconds statistics of my
>> application for the last 30 days!
>
>
>> >> It's very obvious that one instance should be enough for my
>> application. And that was almost the case the last months!
>> > Actually it's not true. In particular, check this log:
>>
>> That's one expection where one client did 8 request in a minute  (+ one
>> pingdom). Nothing else this minute.
>> In those exceptional cases it could be ok if a second instance starts.
>> (Nevertheless can't one instance not
>> handle 8 requests a  minute?)
>>
>
> The issue here is not 8 requests in a minute. Actually there were almost 8
> requests in a second. So App Engine likely needed more than one instance at
> this particular moment. Anyway, as you say, probably it's just a reason for
> one of the loading requests you're seeing, and this is not very important
> thing in this topic.
>
> It's kind of digressing, but at a first glance, the Requests/Seconds stat
> seems an appropriate data source to discuss how many instances are actually
> needed, but in fact, it's not. The real traffic is not spreading equally.
>
>
>>
>> As I described:  Instances are started and stopped without reason, even
>> if less traffic per minute is available!
>
>
> Okay. As far as I understand, here is what you've seen in the past weeks.
>
> * You have been almost always set 'Automatic-2' idle instance setting.
> * More than 3 weeks ago, number of loading requests were very few.
> * Recently you have seen more loading requests than before.
>
> First of all, it seems that you deployed 2 new versions on Aug 1 and Aug
> 2. Can you describe what kind of changes in those versions?
>  I'd like to make sure that there is no changes that can cause the
> scheduler/app server behaving differently.
>
> Especially, if you want me to escalate this issue to our engineering team,
> you should provide the exact information. You say 'My application is
> unchanged', but in fact you deployed the new version on that day when you
> described the issue started. I need to make sure that there is no big
> change which can cause something bad.
>
> And, to be fair, we didn't think of any change in our scheduler around 3
> weeks ago which can cause this issue.
>
> Secondly, you're setting max idle instances = 2. It does not guarantee
> that you have always 2 instances. It just guarantees that we will never
> charge you for more than 2 idle instances at any time.
>
> More than 3 weeks before, those 2 idle instances might have had longer
> lives than now, but it was not a concrete behavior. Please think this way:
> you were just kind of lucky. Now, presumably one or two of those instances
> are occasionally killed for some reasons(there should be certain legitimate
> reasons, but those are something you don't need to care).
>
> If you want some instances always active, please set min idle instances.
> Certainly it will cost you a bit more, and you will loose the pending
> queue, but considering the access pattern of your app(no bursty traffic
> except for few access from the iPhone browser), I would recommend trying
> this setting in order to achieve what you want here. I'd recommend 2 idle
> instances in this case, but you should decide the number.
>
>
>> > * What is the purpose of max-pending-latency = 14.9 setting?
>>
>> " is high App Engine will allow requests to wait rather than start new
>> Instances to process them"
>> --> One attempt to stop GAE to create unnecessary instances.
>>
>
> I think you should set min pending latency instead of max pending latency
> if you want to prevent new instance to spin up. However, if you're going to
> set min idle instances, this setting will almost loose effect. If you don't
> want to set any min idle instances for whatever reason, please consider
> setting min pending latency instead of max pending latency.
>
>
>>
>> > * Can you try automatic-automatic for idle instances setting?
>>
>> I played around with this the last days and nothing changed. As I wrote:
>> I had those configuration for months and it worked fine 3-4 weeks ago!
>>
>
>> > * What is the purpose of those pingdom check? What happens if you stop
>> that?
>>
>> To be alerted if GAE is down a again. "What happens if you stop that?"
>> --> I wouldn't be angry anymore because I wouldn't recognize downtime's of
>> my GAE application. ;)
>>
>
>> Please forward
>> http://code.google.com/p/googleappengine/issues/detail?id=8004  to the
>> relevant GAE deparment.
>>
>
> As you can see, I'm still not convinced to believe that the scheduler is
> misbehaving. I understand that you're having experiences which are bit
> worse than 3 weeks ago, and understand your feeling that you want to tell
> us 'fix it', but I'd say it's still something in the line of 'expected
> behavior' at least for now.
>
> If you feel differently, please let me know.
>
> Regards,
>
> -- Takashi
>
>
>>
>> Thanks!
>>
>>
>> On Fri, Aug 24, 2012 at 1:39 AM, Takashi Matsuo <[email protected]>wrote:
>>
>>>
>>> Hi Mos,
>>>
>>> On Thu, Aug 23, 2012 at 4:58 AM, Mos <[email protected]> wrote:
>>>
>>>> Does anybody else experience abnormal behavior of the
>>>> instance-scheduler the last three weeks (the last 7 days it got even
>>>> worse)?  (Java / HRD)
>>>> Or does anybody has profound knowledge about it?
>>>>
>>>> Background:  My application is unchanged for weeks, configuration not
>>>> changed and application's traffic is constant.
>>>> Traffic: One request per minute from Pingdom and around 200 additional
>>>> pageviews the day (== around 1500 pageviews the day). The peek is not more
>>>> then 3-4 request per minute.
>>>>
>>>
>>> A possible explanation could be that the traffic pattern had changed.
>>>
>>>
>>>>
>>>> It's very obvious that one instance should be enough for my
>>>> application. And that was almost the case the last months!
>>>>
>>>
>>> Actually it's not true. In particular, check this log:
>>>
>>> https://appengine.google.com/logs?app_id=s~krisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search<https://appengine.google.com/logs?app_id=s%7Ekrisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search>
>>>
>>> You can see the iPhone client repeatedly requests your dynamic resources
>>> in a very short amount of time. Presumably it's due to some kind of
>>> 'prefetch' feature of that device. Are you aware of those accesses, and
>>> that this access pattern can cause a new instance starting?
>>>
>>> I don't think this is the only reason, but this can explain that some
>>> portion of your loading requests are expected behavior.
>>>
>>> Now I'd like to ask you some questions.
>>>
>>>
>>> * What is the purpose of max-pending-latency = 14.9 setting?
>>> * Can you try automatic-automatic for idle instances setting?
>>> * What is the purpose of those pingdom check? What happens if you stop
>>> that?
>>>
>>>
>>>>
>>>> But now GAE creates most of the time 3 instances, whereby on has a long
>>>> life-time for days and the other ones are restarted around
>>>> 10 to 30 times the day.
>>>> Because load request takes between 30s to 40s  and requests are waiting
>>>> for loading instances, there are many request that
>>>> fail  (Users and Pingdom agree: *A request that takes more then a
>>>> couple of seconds is a failed request!*)
>>>>
>>>> Please check the attached screenshots that show the behavior!
>>>>
>>>> Note:
>>>> - Killing instances manually did not help
>>>> - Idle Instances were ( Automatic – 2 ) .  Changing it to whatever
>>>> didn't not change anything; e.g. like ( Automatic – 4 )
>>>>
>>>> Thanks and Cheers
>>>>
>>>> Mos
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>
>>>
>>>
>>> --
>>> Takashi Matsuo | Developers Advocate | [email protected]
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>
>
> --
> Takashi Matsuo | Developers Advocate | [email protected]
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Weird Instance Scheduler

Reply via email to