Re: [google-appengine] Weird Instance Scheduler

Kristopher Giesing Fri, 24 Aug 2012 12:49:32 -0700

Hi Takashi,

I ran some experiments with an instance that had requests pending only from 
my own scripts (no user facing traffic at all).


What I found was that sending requests at about 1req/sec, regularly spaced, 
caused GAE to spin up new instances randomly.  If I set the min instances 
setting to anything but "automatic", the very first request would cause a 
new instance to spin up (this was true even if the min instances was some 
high number, like 8, and I waited for all 8 instances to finish launching 
before sending a request - so in this case the # of instances started at 9 
for the very first request).

The only solution I found for this behavior was to package the entire app 
as a backend.

- Kris

On Friday, August 24, 2012 11:43:23 AM UTC-7, Takashi Matsuo (Google) wrote:
>
>
> Hi Mos,
>
> On Sat, Aug 25, 2012 at 1:39 AM, Mos <[email protected] 
> <javascript:>>wrote:
>
>> Hello Takashi,
>>
>>
>> > Actually there were almost 8 requests in a second. So App Engine likely 
>> needed more than one instance at this particular moment.
>>
>> I thought this is why GAE has the concept of "pending-latency"  (which we 
>> discussed below).
>> Meaning:  Incoming requests may wait up to 15 seconds before starting a 
>> new instance. Therefore when 8 requests in one second occur that
>> should not mean that more instance needs to be started. Especially if 
>> there is no other traffic in this minute, as seen in my example.
>> Otherwise it would be a very bad implementation:
>> Starting a new instance means around 30s waiting time.  Serving 8 
>> parallel requests from one instance, would result in a maximum of
>> 8 seconds for the last request (assuming that each request takes around 1 
>> second).
>> There is no reason for this concrete example to fire up more instances 
>> and let requests wait more then 30 seconds until a new instance is loaded.
>>
>
> Do you really read my e-mail?
>
> Setting Max Pending Latency doesn't force requests to be in the pending 
> queue for the specified time. Please use Min Pending Latency instead.
> Can you try this first? If it doesn't work, try 2 min idle instances then.
>  
>
>>
>> > ... here is what you've seen in the past weeks.
>> >
>> >* You have been almost always set 'Automatic-2' idle instance setting.
>> >* More than 3 weeks ago, number of loading requests were very few.
>> > * Recently you have seen more loading requests than before.
>>
>> That, right!  To be even more concrete: At the 16. august the problems 
>> got significant worse. Please check especially the time area from 16. 
>> august until today. 
>>
>> > First of all, it seems that you deployed 2 new versions on Aug 1 and 
>> Aug 2. Can you describe what kind of changes in those versions?
>>
>> I checked it in our version control. As I wrote no related changes were 
>> made! Just Html/Css stuff:
>>  * One picture upload
>>  * One html change
>>  * One JavaScript change
>>  * One css change
>>
>>
>> > And, to be fair, we didn't think of any change in our scheduler around 
>> 3 weeks ago which can cause this issue.
>>
>> And around the 16th august?  
>
>
> Sigh... isn't it a waist of time? What is the reason you picked that date? 
>  
>
>>  
>
>
>> > More than 3 weeks before, those 2 idle instances might have had longer 
>> lives than now, but it was not a concrete behavior. Please think this way: 
>> you were just kind of lucky. 
>>
>> That shouldn't be luck! If GAE is not able to start Java instances in 
>> 5sec to 10 second, there needs be a guarantee that instances have longer 
>> lives.  Otherwise Java applications on GAE are unusable because user would 
>> have a lot of 30seconds wait time  (--> "failed requests"). (See also next 
>> comment regarding resistant instances)
>>
>>
>> > If you want some instances always active, please set min idle instances.
>>
>> I tried this some days ago. I had one resistant instance. But that 
>> changed nothing.  Instances get started and stopped as before. I assumed 
>> that requests would go to the resistant instance first. But that was no the 
>> case. Resistant instance was idle, but a dynamic instance got started and 
>> the request waits 30sec.   
>
> Please check other discussion on this list and issues that reported 
>> similar observations. 
>>
>
> So I'd say please try 2. If you still saw the user-facing loading 
> requests, you need more resident instance to eliminate the user-facing 
> loading requests.
>  
>
>>  
>> > As you can see, I'm still not convinced to believe that the scheduler 
>> is misbehaving. I understand that you're having experiences which are bit 
>> worse than 3 weeks ago, and understand your feeling that you want to tell 
>> us 'fix it', but I'd say it's > >still something in the line of 'expected 
>> behavior' at least for now.
>> > If you feel differently, please let me know.
>>
>> Yes I do feel differently (please see answers above). 
>>
>> Please accept 
>> http://code.google.com/p/googleappengine/issues/detail?id=8004
>>
>
> So what is your expected behavior and actual result? Nobody in our 
> team can do anything if you just keep saying "the setting that used to work 
> doesn't work anymore" without trying mu suggestion.
>
> I think my answer is clear at least for some points. 1) You'd better use 
> 'min pending latency' instead of 'max pending latency' to prevent new 
> instances to spin up as much as possible. 2) If you need longer instance 
> lives, set appropriate number of min idle instances.
>
> -- Takashi
>  
>
>>
>>
>> Thanks
>> Mos
>> http://www.mosbase.com
>>
>>
>> On Fri, Aug 24, 2012 at 4:22 PM, Takashi Matsuo 
>> <[email protected]<javascript:>
>> > wrote:
>>
>>>
>>> Hi Mos,
>>>
>>> On Fri, Aug 24, 2012 at 6:05 PM, Mos <[email protected]<javascript:>
>>> > wrote:
>>>
>>>> > A possible explanation could be that the traffic pattern had changed.
>>>>
>>>> No. It's the same. Check for example the Request/Seconds statistics of 
>>>> my application for the last 30 days! 
>>>
>>>
>>>> >> It's very obvious that one instance should be enough for my 
>>>> application. And that was almost the case the last months!
>>>> > Actually it's not true. In particular, check this log:
>>>>
>>>> That's one expection where one client did 8 request in a minute  (+ one 
>>>> pingdom). Nothing else this minute.
>>>> In those exceptional cases it could be ok if a second instance starts. 
>>>> (Nevertheless can't one instance not
>>>> handle 8 requests a  minute?)
>>>>
>>>
>>> The issue here is not 8 requests in a minute. Actually there were almost 
>>> 8 requests in a second. So App Engine likely needed more than one instance 
>>> at this particular moment. Anyway, as you say, probably it's just a reason 
>>> for one of the loading requests you're seeing, and this is not very 
>>> important thing in this topic.
>>>
>>> It's kind of digressing, but at a first glance, the Requests/Seconds 
>>> stat seems an appropriate data source to discuss how many instances are 
>>> actually needed, but in fact, it's not. The real traffic is not spreading 
>>> equally.
>>>   
>>>
>>>>
>>>> As I described:  Instances are started and stopped without reason, even 
>>>> if less traffic per minute is available!
>>>
>>>
>>> Okay. As far as I understand, here is what you've seen in the past weeks.
>>>
>>> * You have been almost always set 'Automatic-2' idle instance setting.
>>> * More than 3 weeks ago, number of loading requests were very few.
>>> * Recently you have seen more loading requests than before.
>>>
>>> First of all, it seems that you deployed 2 new versions on Aug 1 and Aug 
>>> 2. Can you describe what kind of changes in those versions?
>>>  I'd like to make sure that there is no changes that can cause the 
>>> scheduler/app server behaving differently.
>>>
>>> Especially, if you want me to escalate this issue to our engineering 
>>> team, you should provide the exact information. You say 'My application is 
>>> unchanged', but in fact you deployed the new version on that day when you 
>>> described the issue started. I need to make sure that there is no big 
>>> change which can cause something bad.
>>>
>>> And, to be fair, we didn't think of any change in our scheduler around 3 
>>> weeks ago which can cause this issue.
>>>
>>> Secondly, you're setting max idle instances = 2. It does not guarantee 
>>> that you have always 2 instances. It just guarantees that we will never 
>>> charge you for more than 2 idle instances at any time.
>>>
>>> More than 3 weeks before, those 2 idle instances might have had longer 
>>> lives than now, but it was not a concrete behavior. Please think this way: 
>>> you were just kind of lucky. Now, presumably one or two of those instances 
>>> are occasionally killed for some reasons(there should be certain legitimate 
>>> reasons, but those are something you don't need to care).
>>>
>>> If you want some instances always active, please set min idle instances. 
>>> Certainly it will cost you a bit more, and you will loose the pending 
>>> queue, but considering the access pattern of your app(no bursty traffic 
>>> except for few access from the iPhone browser), I would recommend trying 
>>> this setting in order to achieve what you want here. I'd recommend 2 idle 
>>> instances in this case, but you should decide the number.
>>>  
>>>
>>>> > * What is the purpose of max-pending-latency = 14.9 setting?
>>>>
>>>> " is high App Engine will allow requests to wait rather than start new 
>>>> Instances to process them"
>>>> --> One attempt to stop GAE to create unnecessary instances.
>>>>
>>>
>>> I think you should set min pending latency instead of max pending 
>>> latency if you want to prevent new instance to spin up. However, if you're 
>>> going to set min idle instances, this setting will almost loose effect. If 
>>> you don't want to set any min idle instances for whatever reason, please 
>>> consider setting min pending latency instead of max pending latency.
>>>   
>>>
>>>>
>>>> > * Can you try automatic-automatic for idle instances setting?
>>>>
>>>> I played around with this the last days and nothing changed. As I 
>>>> wrote:  I had those configuration for months and it worked fine 3-4 weeks 
>>>> ago! 
>>>>
>>>  
>>>> > * What is the purpose of those pingdom check? What happens if you 
>>>> stop that?
>>>>
>>>> To be alerted if GAE is down a again. "What happens if you stop that?" 
>>>> --> I wouldn't be angry anymore because I wouldn't recognize downtime's of 
>>>> my GAE application. ;)
>>>>
>>>
>>>> Please forward 
>>>> http://code.google.com/p/googleappengine/issues/detail?id=8004  to the 
>>>> relevant GAE deparment.
>>>>
>>>
>>> As you can see, I'm still not convinced to believe that the scheduler is 
>>> misbehaving. I understand that you're having experiences which are bit 
>>> worse than 3 weeks ago, and understand your feeling that you want to tell 
>>> us 'fix it', but I'd say it's still something in the line of 'expected 
>>> behavior' at least for now.
>>>
>>> If you feel differently, please let me know.
>>>
>>> Regards,
>>>
>>> -- Takashi
>>>  
>>>
>>>>
>>>> Thanks! 
>>>>
>>>>
>>>> On Fri, Aug 24, 2012 at 1:39 AM, Takashi Matsuo 
>>>> <[email protected]<javascript:>
>>>> > wrote:
>>>>
>>>>>
>>>>> Hi Mos,
>>>>>
>>>>> On Thu, Aug 23, 2012 at 4:58 AM, Mos <[email protected]<javascript:>
>>>>> > wrote:
>>>>>
>>>>>> Does anybody else experience abnormal behavior of the 
>>>>>> instance-scheduler the last three weeks (the last 7 days it got even 
>>>>>> worse)?  (Java / HRD)
>>>>>> Or does anybody has profound knowledge about it?
>>>>>>
>>>>>> Background:  My application is unchanged for weeks, configuration not 
>>>>>> changed and application's traffic is constant.
>>>>>> Traffic: One request per minute from Pingdom and around 200 
>>>>>> additional pageviews the day (== around 1500 pageviews the day). The 
>>>>>> peek 
>>>>>> is not more then 3-4 request per minute.
>>>>>>
>>>>>
>>>>> A possible explanation could be that the traffic pattern had changed.
>>>>>   
>>>>>
>>>>>>
>>>>>> It's very obvious that one instance should be enough for my 
>>>>>> application. And that was almost the case the last months!
>>>>>>
>>>>>
>>>>> Actually it's not true. In particular, check this log:
>>>>>
>>>>> https://appengine.google.com/logs?app_id=s~krisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search<https://appengine.google.com/logs?app_id=s%7Ekrisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search>
>>>>>
>>>>> You can see the iPhone client repeatedly requests your dynamic 
>>>>> resources in a very short amount of time. Presumably it's due to some 
>>>>> kind 
>>>>> of 'prefetch' feature of that device. Are you aware of those accesses, 
>>>>> and 
>>>>> that this access pattern can cause a new instance starting?
>>>>>
>>>>> I don't think this is the only reason, but this can explain that some 
>>>>> portion of your loading requests are expected behavior.
>>>>>
>>>>> Now I'd like to ask you some questions.
>>>>>
>>>>>
>>>>> * What is the purpose of max-pending-latency = 14.9 setting?
>>>>> * Can you try automatic-automatic for idle instances setting?
>>>>> * What is the purpose of those pingdom check? What happens if you stop 
>>>>> that?
>>>>>  
>>>>>
>>>>>>
>>>>>> But now GAE creates most of the time 3 instances, whereby on has a 
>>>>>> long life-time for days and the other ones are restarted around
>>>>>> 10 to 30 times the day. 
>>>>>> Because load request takes between 30s to 40s  and requests are 
>>>>>> waiting for loading instances, there are many request that
>>>>>> fail  (Users and Pingdom agree: *A request that takes more then a 
>>>>>> couple of seconds is a failed request!*)
>>>>>>
>>>>>> Please check the attached screenshots that show the behavior!
>>>>>>
>>>>>> Note:
>>>>>> - Killing instances manually did not help
>>>>>> - Idle Instances were ( Automatic – 2 ) .  Changing it to whatever 
>>>>>> didn't not change anything; e.g. like ( Automatic – 4 )
>>>>>>
>>>>>> Thanks and Cheers
>>>>>>
>>>>>> Mos
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Google App Engine" group.
>>>>>> To post to this group, send email to 
>>>>>> [email protected]<javascript:>
>>>>>> .
>>>>>> To unsubscribe from this group, send email to 
>>>>>> [email protected] <javascript:>.
>>>>>> For more options, visit this group at 
>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Takashi Matsuo | Developers Advocate | [email protected] <javascript:>
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Google App Engine" group.
>>>>> To post to this group, send email to 
>>>>> [email protected]<javascript:>
>>>>> .
>>>>> To unsubscribe from this group, send email to 
>>>>> [email protected] <javascript:>.
>>>>> For more options, visit this group at 
>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to 
>>>> [email protected]<javascript:>
>>>> .
>>>> To unsubscribe from this group, send email to 
>>>> [email protected] <javascript:>.
>>>> For more options, visit this group at 
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Takashi Matsuo | Developers Advocate | [email protected] <javascript:>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Google App Engine" group.
>>> To post to this group, send email to 
>>> [email protected]<javascript:>
>>> .
>>> To unsubscribe from this group, send email to 
>>> [email protected] <javascript:>.
>>> For more options, visit this group at 
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Google App Engine" group.
>> To post to this group, send email to 
>> [email protected]<javascript:>
>> .
>> To unsubscribe from this group, send email to 
>> [email protected] <javascript:>.
>> For more options, visit this group at 
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>
>
> -- 
> Takashi Matsuo | Developers Advocate | [email protected] <javascript:>
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/YIzxpRbmyHMJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Weird Instance Scheduler

Reply via email to