Re: [google-appengine] Weird Instance Scheduler

Takashi Matsuo Fri, 24 Aug 2012 19:59:21 -0700

On Sat, Aug 25, 2012 at 5:24 AM, Mos <[email protected]> wrote:

>  > Setting Max Pending Latency doesn't force requests to be in the pending
> queue for the specified time. Please use Min Pending Latency instead.
>
> As you know my setting to "Min Pending Latency" was automatic. The
> expectation is that GAE takes a reasonable default latency if it is
> "automatic".
> And you say:  Every parallel request starts a new instance if it is
> "automatic"? That' would be a "Min Pending Latency" of zero and not
> "automatic".
>
> > If it doesn't work, try 2 min idle instances then
>
> Please check the responses of other user in this thread.  This feature is
> totally broken and can not be used.



>
> >> And around the 16th august?
>  > Sigh... isn't it a waist of time? What is the reason you picked that
> date?
>
> Did you see/studied my pictures from the first post of this thread?
> The statistic shows that on this date the instance creation gets crazy.  I
> double checked it with the Pingdom reports.
> Starting on this day there were even more downtimes.
>
> > So I'd say please try 2. If you still saw the user-facing loading
> requests, you need more resident instance to eliminate the user-facing
> loading requests.
>
> Again: As wrote in my post before that does not work. Check the responses
> from Kristopher and Jeff on this thread.
>
>
Yeah, it's very nice to hear concrete examples from Kristopher and Jeff,
other than just saying "I've tried that, but it didn't work".


>
> > So what is your expected behavior and actual result? Nobody in our
> team can do anything if you just keep saying "the setting that used to work
> doesn't work anymore" without trying mu suggestion.
> > I think my answer is clear at least for some points. 1) You'd better use
> 'min pending latency' instead of 'max pending latency' to prevent new
> instances to spin up as much as possible. 2) If you need longer instance
> lives, set appropriate number of min idle instances.
>
> As I wrote: I tried different settings. As many other people in this group
> as well.
> Me and other people are reporting: The settings are broken!
> It's very easy to reproduce. Please set up an application, send one
> request per minute (or second), configure 1 or 2 or 3 min idle instances
> and check what is happening. You will see that new  instances are started
> although resistant instances are available.
>

It's nice if we have a complete reproducible case. I've just started an
experiment you mentioned. This time, it's just a helloworld application,
and I set 1 min idle instances and 1 minutes cron.

Presumably it will just work fine. Then I will try with slightly different
condition. That way, I hope I can determine what kind of condition could be
the culprit or not. What do you think? Can you provide some simple projects
for that experiment?


> Please take it serious and let somebody of the engineers check this!
>

(I'm one of the engineers btw) A reproducible case is always the best thing
to get engineers' attention.

Regards,

-- Takashi


> Cheers
> Mos
>
>
> On Fri, Aug 24, 2012 at 8:43 PM, Takashi Matsuo <[email protected]>wrote:
>
>>
>> Hi Mos,
>>
>> On Sat, Aug 25, 2012 at 1:39 AM, Mos <[email protected]> wrote:
>>
>>> Hello Takashi,
>>>
>>>
>>> > Actually there were almost 8 requests in a second. So App Engine
>>> likely needed more than one instance at this particular moment.
>>>
>>> I thought this is why GAE has the concept of "pending-latency"  (which
>>> we discussed below).
>>> Meaning:  Incoming requests may wait up to 15 seconds before starting a
>>> new instance. Therefore when 8 requests in one second occur that
>>> should not mean that more instance needs to be started. Especially if
>>> there is no other traffic in this minute, as seen in my example.
>>> Otherwise it would be a very bad implementation:
>>> Starting a new instance means around 30s waiting time.  Serving 8
>>> parallel requests from one instance, would result in a maximum of
>>> 8 seconds for the last request (assuming that each request takes around
>>> 1 second).
>>> There is no reason for this concrete example to fire up more instances
>>> and let requests wait more then 30 seconds until a new instance is loaded.
>>>
>>
>> Do you really read my e-mail?
>>
>> Setting Max Pending Latency doesn't force requests to be in the pending
>> queue for the specified time. Please use Min Pending Latency instead.
>> Can you try this first? If it doesn't work, try 2 min idle instances then.
>>
>>
>>>
>>> > ... here is what you've seen in the past weeks.
>>> >
>>> >* You have been almost always set 'Automatic-2' idle instance setting.
>>> >* More than 3 weeks ago, number of loading requests were very few.
>>> > * Recently you have seen more loading requests than before.
>>>
>>> That, right!  To be even more concrete: At the 16. august the problems
>>> got significant worse. Please check especially the time area from 16.
>>> august until today.
>>>
>>> > First of all, it seems that you deployed 2 new versions on Aug 1 and
>>> Aug 2. Can you describe what kind of changes in those versions?
>>>
>>> I checked it in our version control. As I wrote no related changes were
>>> made! Just Html/Css stuff:
>>>  * One picture upload
>>>  * One html change
>>>  * One JavaScript change
>>>  * One css change
>>>
>>>
>>> > And, to be fair, we didn't think of any change in our scheduler around
>>> 3 weeks ago which can cause this issue.
>>>
>>> And around the 16th august?
>>
>>
>> Sigh... isn't it a waist of time? What is the reason you picked that
>> date?
>>
>>
>>>
>>
>>
>>> > More than 3 weeks before, those 2 idle instances might have had longer
>>> lives than now, but it was not a concrete behavior. Please think this way:
>>> you were just kind of lucky.
>>>
>>> That shouldn't be luck! If GAE is not able to start Java instances in
>>> 5sec to 10 second, there needs be a guarantee that instances have longer
>>> lives.  Otherwise Java applications on GAE are unusable because user would
>>> have a lot of 30seconds wait time  (--> "failed requests"). (See also next
>>> comment regarding resistant instances)
>>>
>>>
>>> > If you want some instances always active, please set min idle
>>> instances.
>>>
>>> I tried this some days ago. I had one resistant instance. But that
>>> changed nothing.  Instances get started and stopped as before. I assumed
>>> that requests would go to the resistant instance first. But that was no the
>>> case. Resistant instance was idle, but a dynamic instance got started and
>>> the request waits 30sec.
>>
>> Please check other discussion on this list and issues that reported
>>> similar observations.
>>>
>>
>> So I'd say please try 2. If you still saw the user-facing loading
>> requests, you need more resident instance to eliminate the user-facing
>> loading requests.
>>
>>
>>>
>>> > As you can see, I'm still not convinced to believe that the scheduler
>>> is misbehaving. I understand that you're having experiences which are bit
>>> worse than 3 weeks ago, and understand your feeling that you want to tell
>>> us 'fix it', but I'd say it's > >still something in the line of 'expected
>>> behavior' at least for now.
>>> > If you feel differently, please let me know.
>>>
>>> Yes I do feel differently (please see answers above).
>>>
>>> Please accept
>>> http://code.google.com/p/googleappengine/issues/detail?id=8004
>>>
>>
>> So what is your expected behavior and actual result? Nobody in our
>> team can do anything if you just keep saying "the setting that used to work
>> doesn't work anymore" without trying mu suggestion.
>>
>> I think my answer is clear at least for some points. 1) You'd better use
>> 'min pending latency' instead of 'max pending latency' to prevent new
>> instances to spin up as much as possible. 2) If you need longer instance
>> lives, set appropriate number of min idle instances.
>>
>> -- Takashi
>>
>>
>>>
>>>
>>> Thanks
>>> Mos
>>> http://www.mosbase.com
>>>
>>>
>>> On Fri, Aug 24, 2012 at 4:22 PM, Takashi Matsuo <[email protected]>wrote:
>>>
>>>>
>>>> Hi Mos,
>>>>
>>>> On Fri, Aug 24, 2012 at 6:05 PM, Mos <[email protected]> wrote:
>>>>
>>>>> > A possible explanation could be that the traffic pattern had changed.
>>>>>
>>>>> No. It's the same. Check for example the Request/Seconds statistics of
>>>>> my application for the last 30 days!
>>>>
>>>>
>>>>> >> It's very obvious that one instance should be enough for my
>>>>> application. And that was almost the case the last months!
>>>>> > Actually it's not true. In particular, check this log:
>>>>>
>>>>> That's one expection where one client did 8 request in a minute  (+
>>>>> one pingdom). Nothing else this minute.
>>>>> In those exceptional cases it could be ok if a second instance starts.
>>>>> (Nevertheless can't one instance not
>>>>> handle 8 requests a  minute?)
>>>>>
>>>>
>>>> The issue here is not 8 requests in a minute. Actually there were
>>>> almost 8 requests in a second. So App Engine likely needed more than one
>>>> instance at this particular moment. Anyway, as you say, probably it's just
>>>> a reason for one of the loading requests you're seeing, and this is not
>>>> very important thing in this topic.
>>>>
>>>> It's kind of digressing, but at a first glance, the Requests/Seconds
>>>> stat seems an appropriate data source to discuss how many instances are
>>>> actually needed, but in fact, it's not. The real traffic is not spreading
>>>> equally.
>>>>
>>>>
>>>>>
>>>>> As I described:  Instances are started and stopped without reason,
>>>>> even if less traffic per minute is available!
>>>>
>>>>
>>>> Okay. As far as I understand, here is what you've seen in the past
>>>> weeks.
>>>>
>>>> * You have been almost always set 'Automatic-2' idle instance setting.
>>>> * More than 3 weeks ago, number of loading requests were very few.
>>>> * Recently you have seen more loading requests than before.
>>>>
>>>> First of all, it seems that you deployed 2 new versions on Aug 1 and
>>>> Aug 2. Can you describe what kind of changes in those versions?
>>>>  I'd like to make sure that there is no changes that can cause the
>>>> scheduler/app server behaving differently.
>>>>
>>>> Especially, if you want me to escalate this issue to our engineering
>>>> team, you should provide the exact information. You say 'My application is
>>>> unchanged', but in fact you deployed the new version on that day when you
>>>> described the issue started. I need to make sure that there is no big
>>>> change which can cause something bad.
>>>>
>>>> And, to be fair, we didn't think of any change in our scheduler around
>>>> 3 weeks ago which can cause this issue.
>>>>
>>>> Secondly, you're setting max idle instances = 2. It does not guarantee
>>>> that you have always 2 instances. It just guarantees that we will never
>>>> charge you for more than 2 idle instances at any time.
>>>>
>>>> More than 3 weeks before, those 2 idle instances might have had longer
>>>> lives than now, but it was not a concrete behavior. Please think this way:
>>>> you were just kind of lucky. Now, presumably one or two of those instances
>>>> are occasionally killed for some reasons(there should be certain legitimate
>>>> reasons, but those are something you don't need to care).
>>>>
>>>> If you want some instances always active, please set min idle
>>>> instances. Certainly it will cost you a bit more, and you will loose the
>>>> pending queue, but considering the access pattern of your app(no bursty
>>>> traffic except for few access from the iPhone browser), I would recommend
>>>> trying this setting in order to achieve what you want here. I'd recommend 2
>>>> idle instances in this case, but you should decide the number.
>>>>
>>>>
>>>>> > * What is the purpose of max-pending-latency = 14.9 setting?
>>>>>
>>>>> " is high App Engine will allow requests to wait rather than start new
>>>>> Instances to process them"
>>>>> --> One attempt to stop GAE to create unnecessary instances.
>>>>>
>>>>
>>>> I think you should set min pending latency instead of max pending
>>>> latency if you want to prevent new instance to spin up. However, if you're
>>>> going to set min idle instances, this setting will almost loose effect. If
>>>> you don't want to set any min idle instances for whatever reason, please
>>>> consider setting min pending latency instead of max pending latency.
>>>>
>>>>
>>>>>
>>>>> > * Can you try automatic-automatic for idle instances setting?
>>>>>
>>>>> I played around with this the last days and nothing changed. As I
>>>>> wrote:  I had those configuration for months and it worked fine 3-4 weeks
>>>>> ago!
>>>>>
>>>>
>>>>> > * What is the purpose of those pingdom check? What happens if you
>>>>> stop that?
>>>>>
>>>>> To be alerted if GAE is down a again. "What happens if you stop that?"
>>>>> --> I wouldn't be angry anymore because I wouldn't recognize downtime's of
>>>>> my GAE application. ;)
>>>>>
>>>>
>>>>> Please forward
>>>>> http://code.google.com/p/googleappengine/issues/detail?id=8004  to
>>>>> the relevant GAE deparment.
>>>>>
>>>>
>>>> As you can see, I'm still not convinced to believe that the scheduler
>>>> is misbehaving. I understand that you're having experiences which are bit
>>>> worse than 3 weeks ago, and understand your feeling that you want to tell
>>>> us 'fix it', but I'd say it's still something in the line of 'expected
>>>> behavior' at least for now.
>>>>
>>>> If you feel differently, please let me know.
>>>>
>>>> Regards,
>>>>
>>>> -- Takashi
>>>>
>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> On Fri, Aug 24, 2012 at 1:39 AM, Takashi Matsuo <[email protected]>wrote:
>>>>>
>>>>>>
>>>>>> Hi Mos,
>>>>>>
>>>>>> On Thu, Aug 23, 2012 at 4:58 AM, Mos <[email protected]> wrote:
>>>>>>
>>>>>>> Does anybody else experience abnormal behavior of the
>>>>>>> instance-scheduler the last three weeks (the last 7 days it got even
>>>>>>> worse)?  (Java / HRD)
>>>>>>> Or does anybody has profound knowledge about it?
>>>>>>>
>>>>>>> Background:  My application is unchanged for weeks, configuration
>>>>>>> not changed and application's traffic is constant.
>>>>>>> Traffic: One request per minute from Pingdom and around 200
>>>>>>> additional pageviews the day (== around 1500 pageviews the day). The 
>>>>>>> peek
>>>>>>> is not more then 3-4 request per minute.
>>>>>>>
>>>>>>
>>>>>> A possible explanation could be that the traffic pattern had changed.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> It's very obvious that one instance should be enough for my
>>>>>>> application. And that was almost the case the last months!
>>>>>>>
>>>>>>
>>>>>> Actually it's not true. In particular, check this log:
>>>>>>
>>>>>> https://appengine.google.com/logs?app_id=s~krisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search<https://appengine.google.com/logs?app_id=s%7Ekrisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search>
>>>>>>
>>>>>> You can see the iPhone client repeatedly requests your dynamic
>>>>>> resources in a very short amount of time. Presumably it's due to some 
>>>>>> kind
>>>>>> of 'prefetch' feature of that device. Are you aware of those accesses, 
>>>>>> and
>>>>>> that this access pattern can cause a new instance starting?
>>>>>>
>>>>>> I don't think this is the only reason, but this can explain that some
>>>>>> portion of your loading requests are expected behavior.
>>>>>>
>>>>>> Now I'd like to ask you some questions.
>>>>>>
>>>>>>
>>>>>> * What is the purpose of max-pending-latency = 14.9 setting?
>>>>>> * Can you try automatic-automatic for idle instances setting?
>>>>>> * What is the purpose of those pingdom check? What happens if you
>>>>>> stop that?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> But now GAE creates most of the time 3 instances, whereby on has a
>>>>>>> long life-time for days and the other ones are restarted around
>>>>>>> 10 to 30 times the day.
>>>>>>> Because load request takes between 30s to 40s  and requests are
>>>>>>> waiting for loading instances, there are many request that
>>>>>>> fail  (Users and Pingdom agree: *A request that takes more then a
>>>>>>> couple of seconds is a failed request!*)
>>>>>>>
>>>>>>> Please check the attached screenshots that show the behavior!
>>>>>>>
>>>>>>> Note:
>>>>>>> - Killing instances manually did not help
>>>>>>> - Idle Instances were ( Automatic – 2 ) .  Changing it to whatever
>>>>>>> didn't not change anything; e.g. like ( Automatic – 4 )
>>>>>>>
>>>>>>> Thanks and Cheers
>>>>>>>
>>>>>>> Mos
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Google App Engine" group.
>>>>>>> To post to this group, send email to
>>>>>>> [email protected].
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> [email protected].
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Takashi Matsuo | Developers Advocate | [email protected]
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Google App Engine" group.
>>>>>> To post to this group, send email to
>>>>>> [email protected].
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected].
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Google App Engine" group.
>>>>> To post to this group, send email to [email protected]
>>>>> .
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected].
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Takashi Matsuo | Developers Advocate | [email protected]
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>
>>
>>
>> --
>> Takashi Matsuo | Developers Advocate | [email protected]
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>



-- 
Takashi Matsuo | Developers Advocate | [email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Weird Instance Scheduler

Reply via email to