Re: [google-appengine] Weird Instance Scheduler

Kristopher Giesing Sun, 26 Aug 2012 14:47:42 -0700

Hi Takashi,

I created a new GAE app to test this and found that I'm not getting the 
same instance tuning controls in the new app ID that I am getting in my 
current one.


In my current app, I can set both min and max idle instances, and min and 
max pending latency.

In the new app, I can set only max idle instances, and min pending latency.

Any ideas why this would be the case?  It complicates the process of 
setting up a good testbed for this.

- Kris

On Friday, August 24, 2012 7:59:17 PM UTC-7, Takashi Matsuo (Google) wrote:
>
> On Sat, Aug 25, 2012 at 5:24 AM, Mos <[email protected] 
> <javascript:>>wrote:
>
>>  > Setting Max Pending Latency doesn't force requests to be in the 
>> pending queue for the specified time. Please use Min Pending Latency 
>> instead.
>>
>> As you know my setting to "Min Pending Latency" was automatic. The 
>> expectation is that GAE takes a reasonable default latency if it is 
>> "automatic".
>> And you say:  Every parallel request starts a new instance if it is 
>> "automatic"? That' would be a "Min Pending Latency" of zero and not 
>> "automatic".
>>
>> > If it doesn't work, try 2 min idle instances then
>>
>> Please check the responses of other user in this thread.  This feature is 
>> totally broken and can not be used.
>
>
>>
>> >> And around the 16th august?  
>>  > Sigh... isn't it a waist of time? What is the reason you picked that 
>> date? 
>>
>> Did you see/studied my pictures from the first post of this thread?
>> The statistic shows that on this date the instance creation gets crazy.  
>> I double checked it with the Pingdom reports.
>> Starting on this day there were even more downtimes.
>>
>> > So I'd say please try 2. If you still saw the user-facing loading 
>> requests, you need more resident instance to eliminate the user-facing 
>> loading requests.
>>
>> Again: As wrote in my post before that does not work. Check the responses 
>> from Kristopher and Jeff on this thread.
>>
>>
> Yeah, it's very nice to hear concrete examples from Kristopher and Jeff, 
> other than just saying "I've tried that, but it didn't work".
>  
>
>>
>> > So what is your expected behavior and actual result? Nobody in our 
>> team can do anything if you just keep saying "the setting that used to work 
>> doesn't work anymore" without trying mu suggestion.
>> > I think my answer is clear at least for some points. 1) You'd better 
>> use 'min pending latency' instead of 'max pending latency' to prevent new 
>> instances to spin up as much as possible. 2) If you need longer instance 
>> lives, set appropriate number of min idle instances.
>>
>> As I wrote: I tried different settings. As many other people in this 
>> group as well.
>> Me and other people are reporting: The settings are broken!
>> It's very easy to reproduce. Please set up an application, send one 
>> request per minute (or second), configure 1 or 2 or 3 min idle instances 
>> and check what is happening. You will see that new  instances are started 
>> although resistant instances are available.
>>
>
> It's nice if we have a complete reproducible case. I've just started an 
> experiment you mentioned. This time, it's just a helloworld application, 
> and I set 1 min idle instances and 1 minutes cron.
>
> Presumably it will just work fine. Then I will try with slightly different 
> condition. That way, I hope I can determine what kind of condition could be 
> the culprit or not. What do you think? Can you provide some simple projects 
> for that experiment?
>
>
>> Please take it serious and let somebody of the engineers check this!
>>
>
> (I'm one of the engineers btw) A reproducible case is always the best 
> thing to get engineers' attention.
>
> Regards,
>
> -- Takashi
>
>
>> Cheers
>> Mos 
>>
>>
>> On Fri, Aug 24, 2012 at 8:43 PM, Takashi Matsuo 
>> <[email protected]<javascript:>
>> > wrote:
>>
>>>
>>> Hi Mos,
>>>
>>> On Sat, Aug 25, 2012 at 1:39 AM, Mos <[email protected]<javascript:>
>>> > wrote:
>>>
>>>> Hello Takashi,
>>>>
>>>>
>>>> > Actually there were almost 8 requests in a second. So App Engine 
>>>> likely needed more than one instance at this particular moment.
>>>>
>>>> I thought this is why GAE has the concept of "pending-latency"  (which 
>>>> we discussed below).
>>>> Meaning:  Incoming requests may wait up to 15 seconds before starting a 
>>>> new instance. Therefore when 8 requests in one second occur that
>>>> should not mean that more instance needs to be started. Especially if 
>>>> there is no other traffic in this minute, as seen in my example.
>>>> Otherwise it would be a very bad implementation:
>>>> Starting a new instance means around 30s waiting time.  Serving 8 
>>>> parallel requests from one instance, would result in a maximum of
>>>> 8 seconds for the last request (assuming that each request takes around 
>>>> 1 second).
>>>> There is no reason for this concrete example to fire up more instances 
>>>> and let requests wait more then 30 seconds until a new instance is loaded.
>>>>
>>>
>>> Do you really read my e-mail?
>>>  
>>> Setting Max Pending Latency doesn't force requests to be in the pending 
>>> queue for the specified time. Please use Min Pending Latency instead.
>>> Can you try this first? If it doesn't work, try 2 min idle instances 
>>> then.
>>>  
>>>
>>>>
>>>> > ... here is what you've seen in the past weeks.
>>>> >
>>>> >* You have been almost always set 'Automatic-2' idle instance setting.
>>>> >* More than 3 weeks ago, number of loading requests were very few.
>>>> > * Recently you have seen more loading requests than before.
>>>>
>>>> That, right!  To be even more concrete: At the 16. august the problems 
>>>> got significant worse. Please check especially the time area from 16. 
>>>> august until today. 
>>>>
>>>> > First of all, it seems that you deployed 2 new versions on Aug 1 and 
>>>> Aug 2. Can you describe what kind of changes in those versions?
>>>>
>>>> I checked it in our version control. As I wrote no related changes were 
>>>> made! Just Html/Css stuff:
>>>>  * One picture upload
>>>>  * One html change
>>>>  * One JavaScript change
>>>>  * One css change
>>>>
>>>>
>>>> > And, to be fair, we didn't think of any change in our scheduler 
>>>> around 3 weeks ago which can cause this issue.
>>>>
>>>> And around the 16th august?  
>>>
>>>
>>> Sigh... isn't it a waist of time? What is the reason you picked that 
>>> date? 
>>>  
>>>
>>>>  
>>>
>>>
>>>> > More than 3 weeks before, those 2 idle instances might have had 
>>>> longer lives than now, but it was not a concrete behavior. Please think 
>>>> this way: you were just kind of lucky. 
>>>>
>>>> That shouldn't be luck! If GAE is not able to start Java instances in 
>>>> 5sec to 10 second, there needs be a guarantee that instances have longer 
>>>> lives.  Otherwise Java applications on GAE are unusable because user would 
>>>> have a lot of 30seconds wait time  (--> "failed requests"). (See also next 
>>>> comment regarding resistant instances)
>>>>
>>>>
>>>> > If you want some instances always active, please set min idle 
>>>> instances.
>>>>
>>>> I tried this some days ago. I had one resistant instance. But that 
>>>> changed nothing.  Instances get started and stopped as before. I assumed 
>>>> that requests would go to the resistant instance first. But that was no 
>>>> the 
>>>> case. Resistant instance was idle, but a dynamic instance got started and 
>>>> the request waits 30sec.   
>>>
>>> Please check other discussion on this list and issues that reported 
>>>> similar observations. 
>>>>
>>>
>>> So I'd say please try 2. If you still saw the user-facing loading 
>>> requests, you need more resident instance to eliminate the user-facing 
>>> loading requests.
>>>  
>>>
>>>>  
>>>> > As you can see, I'm still not convinced to believe that the scheduler 
>>>> is misbehaving. I understand that you're having experiences which are bit 
>>>> worse than 3 weeks ago, and understand your feeling that you want to tell 
>>>> us 'fix it', but I'd say it's > >still something in the line of 'expected 
>>>> behavior' at least for now.
>>>> > If you feel differently, please let me know.
>>>>
>>>> Yes I do feel differently (please see answers above). 
>>>>
>>>> Please accept 
>>>> http://code.google.com/p/googleappengine/issues/detail?id=8004
>>>>
>>>
>>> So what is your expected behavior and actual result? Nobody in our 
>>> team can do anything if you just keep saying "the setting that used to work 
>>> doesn't work anymore" without trying mu suggestion.
>>>
>>> I think my answer is clear at least for some points. 1) You'd better use 
>>> 'min pending latency' instead of 'max pending latency' to prevent new 
>>> instances to spin up as much as possible. 2) If you need longer instance 
>>> lives, set appropriate number of min idle instances.
>>>  
>>> -- Takashi
>>>  
>>>
>>>>
>>>>
>>>> Thanks
>>>> Mos
>>>> http://www.mosbase.com
>>>>
>>>>
>>>> On Fri, Aug 24, 2012 at 4:22 PM, Takashi Matsuo 
>>>> <[email protected]<javascript:>
>>>> > wrote:
>>>>
>>>>>
>>>>> Hi Mos,
>>>>>
>>>>> On Fri, Aug 24, 2012 at 6:05 PM, Mos <[email protected]<javascript:>
>>>>> > wrote:
>>>>>
>>>>>> > A possible explanation could be that the traffic pattern had 
>>>>>> changed.
>>>>>>
>>>>>> No. It's the same. Check for example the Request/Seconds statistics 
>>>>>> of my application for the last 30 days! 
>>>>>
>>>>>
>>>>>> >> It's very obvious that one instance should be enough for my 
>>>>>> application. And that was almost the case the last months!
>>>>>> > Actually it's not true. In particular, check this log:
>>>>>>
>>>>>> That's one expection where one client did 8 request in a minute  (+ 
>>>>>> one pingdom). Nothing else this minute.
>>>>>> In those exceptional cases it could be ok if a second instance 
>>>>>> starts. (Nevertheless can't one instance not
>>>>>> handle 8 requests a  minute?)
>>>>>>
>>>>>
>>>>> The issue here is not 8 requests in a minute. Actually there were 
>>>>> almost 8 requests in a second. So App Engine likely needed more than one 
>>>>> instance at this particular moment. Anyway, as you say, probably it's 
>>>>> just 
>>>>> a reason for one of the loading requests you're seeing, and this is not 
>>>>> very important thing in this topic.
>>>>>
>>>>> It's kind of digressing, but at a first glance, the Requests/Seconds 
>>>>> stat seems an appropriate data source to discuss how many instances are 
>>>>> actually needed, but in fact, it's not. The real traffic is not spreading 
>>>>> equally.
>>>>>   
>>>>>
>>>>>>
>>>>>> As I described:  Instances are started and stopped without reason, 
>>>>>> even if less traffic per minute is available!
>>>>>
>>>>>
>>>>> Okay. As far as I understand, here is what you've seen in the past 
>>>>> weeks.
>>>>>
>>>>> * You have been almost always set 'Automatic-2' idle instance setting.
>>>>> * More than 3 weeks ago, number of loading requests were very few.
>>>>> * Recently you have seen more loading requests than before.
>>>>>
>>>>> First of all, it seems that you deployed 2 new versions on Aug 1 and 
>>>>> Aug 2. Can you describe what kind of changes in those versions?
>>>>>  I'd like to make sure that there is no changes that can cause the 
>>>>> scheduler/app server behaving differently.
>>>>>
>>>>> Especially, if you want me to escalate this issue to our engineering 
>>>>> team, you should provide the exact information. You say 'My application 
>>>>> is 
>>>>> unchanged', but in fact you deployed the new version on that day when you 
>>>>> described the issue started. I need to make sure that there is no big 
>>>>> change which can cause something bad.
>>>>>
>>>>> And, to be fair, we didn't think of any change in our scheduler around 
>>>>> 3 weeks ago which can cause this issue.
>>>>>
>>>>> Secondly, you're setting max idle instances = 2. It does not guarantee 
>>>>> that you have always 2 instances. It just guarantees that we will never 
>>>>> charge you for more than 2 idle instances at any time.
>>>>>
>>>>> More than 3 weeks before, those 2 idle instances might have had longer 
>>>>> lives than now, but it was not a concrete behavior. Please think this 
>>>>> way: 
>>>>> you were just kind of lucky. Now, presumably one or two of those 
>>>>> instances 
>>>>> are occasionally killed for some reasons(there should be certain 
>>>>> legitimate 
>>>>> reasons, but those are something you don't need to care).
>>>>>
>>>>> If you want some instances always active, please set min idle 
>>>>> instances. Certainly it will cost you a bit more, and you will loose the 
>>>>> pending queue, but considering the access pattern of your app(no bursty 
>>>>> traffic except for few access from the iPhone browser), I would recommend 
>>>>> trying this setting in order to achieve what you want here. I'd recommend 
>>>>> 2 
>>>>> idle instances in this case, but you should decide the number.
>>>>>  
>>>>>
>>>>>> > * What is the purpose of max-pending-latency = 14.9 setting?
>>>>>>
>>>>>> " is high App Engine will allow requests to wait rather than start 
>>>>>> new Instances to process them"
>>>>>> --> One attempt to stop GAE to create unnecessary instances.
>>>>>>
>>>>>
>>>>> I think you should set min pending latency instead of max pending 
>>>>> latency if you want to prevent new instance to spin up. However, if 
>>>>> you're 
>>>>> going to set min idle instances, this setting will almost loose effect. 
>>>>> If 
>>>>> you don't want to set any min idle instances for whatever reason, please 
>>>>> consider setting min pending latency instead of max pending latency.
>>>>>   
>>>>>
>>>>>>
>>>>>> > * Can you try automatic-automatic for idle instances setting?
>>>>>>
>>>>>> I played around with this the last days and nothing changed. As I 
>>>>>> wrote:  I had those configuration for months and it worked fine 3-4 
>>>>>> weeks 
>>>>>> ago! 
>>>>>>
>>>>>  
>>>>>> > * What is the purpose of those pingdom check? What happens if you 
>>>>>> stop that?
>>>>>>
>>>>>> To be alerted if GAE is down a again. "What happens if you stop 
>>>>>> that?" --> I wouldn't be angry anymore because I wouldn't recognize 
>>>>>> downtime's of my GAE application. ;)
>>>>>>
>>>>>
>>>>>> Please forward 
>>>>>> http://code.google.com/p/googleappengine/issues/detail?id=8004  to 
>>>>>> the relevant GAE deparment.
>>>>>>
>>>>>
>>>>> As you can see, I'm still not convinced to believe that the scheduler 
>>>>> is misbehaving. I understand that you're having experiences which are bit 
>>>>> worse than 3 weeks ago, and understand your feeling that you want to tell 
>>>>> us 'fix it', but I'd say it's still something in the line of 'expected 
>>>>> behavior' at least for now.
>>>>>
>>>>> If you feel differently, please let me know.
>>>>>
>>>>> Regards,
>>>>>
>>>>> -- Takashi
>>>>>  
>>>>>
>>>>>>
>>>>>> Thanks! 
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 24, 2012 at 1:39 AM, Takashi Matsuo 
>>>>>> <[email protected]<javascript:>
>>>>>> > wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Mos,
>>>>>>>
>>>>>>> On Thu, Aug 23, 2012 at 4:58 AM, Mos <[email protected]<javascript:>
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Does anybody else experience abnormal behavior of the 
>>>>>>>> instance-scheduler the last three weeks (the last 7 days it got even 
>>>>>>>> worse)?  (Java / HRD)
>>>>>>>> Or does anybody has profound knowledge about it?
>>>>>>>>
>>>>>>>> Background:  My application is unchanged for weeks, configuration 
>>>>>>>> not changed and application's traffic is constant.
>>>>>>>> Traffic: One request per minute from Pingdom and around 200 
>>>>>>>> additional pageviews the day (== around 1500 pageviews the day). The 
>>>>>>>> peek 
>>>>>>>> is not more then 3-4 request per minute.
>>>>>>>>
>>>>>>>
>>>>>>> A possible explanation could be that the traffic pattern had changed.
>>>>>>>   
>>>>>>>
>>>>>>>>
>>>>>>>> It's very obvious that one instance should be enough for my 
>>>>>>>> application. And that was almost the case the last months!
>>>>>>>>
>>>>>>>
>>>>>>> Actually it's not true. In particular, check this log:
>>>>>>>
>>>>>>> https://appengine.google.com/logs?app_id=s~krisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search<https://appengine.google.com/logs?app_id=s%7Ekrisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search>
>>>>>>>
>>>>>>> You can see the iPhone client repeatedly requests your dynamic 
>>>>>>> resources in a very short amount of time. Presumably it's due to some 
>>>>>>> kind 
>>>>>>> of 'prefetch' feature of that device. Are you aware of those accesses, 
>>>>>>> and 
>>>>>>> that this access pattern can cause a new instance starting?
>>>>>>>
>>>>>>> I don't think this is the only reason, but this can explain that 
>>>>>>> some portion of your loading requests are expected behavior.
>>>>>>>
>>>>>>> Now I'd like to ask you some questions.
>>>>>>>
>>>>>>>
>>>>>>> * What is the purpose of max-pending-latency = 14.9 setting?
>>>>>>> * Can you try automatic-automatic for idle instances setting?
>>>>>>> * What is the purpose of those pingdom check? What happens if you 
>>>>>>> stop that?
>>>>>>>  
>>>>>>>
>>>>>>>>
>>>>>>>> But now GAE creates most of the time 3 instances, whereby on has a 
>>>>>>>> long life-time for days and the other ones are restarted around
>>>>>>>> 10 to 30 times the day. 
>>>>>>>> Because load request takes between 30s to 40s  and requests are 
>>>>>>>> waiting for loading instances, there are many request that
>>>>>>>> fail  (Users and Pingdom agree: *A request that takes more then a 
>>>>>>>> couple of seconds is a failed request!*)
>>>>>>>>
>>>>>>>> Please check the attached screenshots that show the behavior!
>>>>>>>>
>>>>>>>> Note:
>>>>>>>> - Killing instances manually did not help
>>>>>>>> - Idle Instances were ( Automatic – 2 ) .  Changing it to whatever 
>>>>>>>> didn't not change anything; e.g. like ( Automatic – 4 )
>>>>>>>>
>>>>>>>> Thanks and Cheers
>>>>>>>>
>>>>>>>> Mos
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  -- 
>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>> Groups "Google App Engine" group.
>>>>>>>> To post to this group, send email to 
>>>>>>>> [email protected]<javascript:>
>>>>>>>> .
>>>>>>>> To unsubscribe from this group, send email to 
>>>>>>>> [email protected] <javascript:>.
>>>>>>>> For more options, visit this group at 
>>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Takashi Matsuo | Developers Advocate | [email protected]<javascript:>
>>>>>>>
>>>>>>>  -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Google App Engine" group.
>>>>>>> To post to this group, send email to 
>>>>>>> [email protected]<javascript:>
>>>>>>> .
>>>>>>> To unsubscribe from this group, send email to 
>>>>>>> [email protected] <javascript:>.
>>>>>>> For more options, visit this group at 
>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Google App Engine" group.
>>>>>> To post to this group, send email to 
>>>>>> [email protected]<javascript:>
>>>>>> .
>>>>>> To unsubscribe from this group, send email to 
>>>>>> [email protected] <javascript:>.
>>>>>> For more options, visit this group at 
>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>> Takashi Matsuo | Developers Advocate | [email protected] <javascript:>
>>>>>
>>>>>  -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Google App Engine" group.
>>>>> To post to this group, send email to 
>>>>> [email protected]<javascript:>
>>>>> .
>>>>> To unsubscribe from this group, send email to 
>>>>> [email protected] <javascript:>.
>>>>> For more options, visit this group at 
>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>
>>>>
>>>>  -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to 
>>>> [email protected]<javascript:>
>>>> .
>>>> To unsubscribe from this group, send email to 
>>>> [email protected] <javascript:>.
>>>> For more options, visit this group at 
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Takashi Matsuo | Developers Advocate | [email protected] <javascript:>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Google App Engine" group.
>>> To post to this group, send email to 
>>> [email protected]<javascript:>
>>> .
>>> To unsubscribe from this group, send email to 
>>> [email protected] <javascript:>.
>>> For more options, visit this group at 
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Google App Engine" group.
>> To post to this group, send email to 
>> [email protected]<javascript:>
>> .
>> To unsubscribe from this group, send email to 
>> [email protected] <javascript:>.
>> For more options, visit this group at 
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>
>
> -- 
> Takashi Matsuo | Developers Advocate | [email protected] <javascript:>
>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/google-appengine/-/vypbs4jA5cgJ.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Weird Instance Scheduler

Reply via email to