Re: [google-appengine] Weird Instance Scheduler

Mos Fri, 24 Aug 2012 14:28:04 -0700

Thanks Johan. I read the post some days before.

As often discussed on the mailing-list before and as Jeff said in this
thread.
It's the combination of "Requests should never be sent to cold instances."
and(!) the behavior of min idle instance which doesn't make any sense.


Please check the last comment of
http://code.google.com/p/googleappengine/issues/detail?id=8004 where wrote
down the problems in my point of view.

Senior Java-developers on this list which have many months of experience
with GAE stated again again that there is a big issue around instance
handling.
I think you have to trust your power-user and assign a team to work on this!

On Fri, Aug 24, 2012 at 10:58 PM, Johan Euphrosine <[email protected]>wrote:

> Hi all,
>
> Please review the following thread where the lead engineer working on the
> scheduler (Jon McAlister) took the time to explain in great detail the
> behavior of min idle instance.
> https://groups.google.com/d/msg/google-appengine/nRtzGtG9790/hLS16qux_04J
>
> Once you read this, we can discuss if what you're experiencing is really a
> bug, or if you want the scheduler to behave differently from its current
> implementation, in which case the more constructive way out of this
> discussion is to fill feature request, and get it starred by your peers.
> On Aug 24, 2012 10:24 PM, "Mos" <[email protected]> wrote:
>
>>  > Setting Max Pending Latency doesn't force requests to be in the
>> pending queue for the specified time. Please use Min Pending Latency
>> instead.
>>
>> As you know my setting to "Min Pending Latency" was automatic. The
>> expectation is that GAE takes a reasonable default latency if it is
>> "automatic".
>> And you say:  Every parallel request starts a new instance if it is
>> "automatic"? That' would be a "Min Pending Latency" of zero and not
>> "automatic".
>>
>> > If it doesn't work, try 2 min idle instances then
>>
>> Please check the responses of other user in this thread.  This feature is
>> totally broken and can not be used.
>>
>> >> And around the 16th august?
>> > Sigh... isn't it a waist of time? What is the reason you picked that
>> date?
>>
>> Did you see/studied my pictures from the first post of this thread?
>> The statistic shows that on this date the instance creation gets crazy.
>> I double checked it with the Pingdom reports.
>> Starting on this day there were even more downtimes.
>>
>> > So I'd say please try 2. If you still saw the user-facing loading
>> requests, you need more resident instance to eliminate the user-facing
>> loading requests.
>>
>> Again: As wrote in my post before that does not work. Check the responses
>> from Kristopher and Jeff on this thread.
>>
>> > So what is your expected behavior and actual result? Nobody in our
>> team can do anything if you just keep saying "the setting that used to work
>> doesn't work anymore" without trying mu suggestion.
>> > I think my answer is clear at least for some points. 1) You'd better
>> use 'min pending latency' instead of 'max pending latency' to prevent new
>> instances to spin up as much as possible. 2) If you need longer instance
>> lives, set appropriate number of min idle instances.
>>
>> As I wrote: I tried different settings. As many other people in this
>> group as well.
>> Me and other people are reporting: The settings are broken!
>> It's very easy to reproduce. Please set up an application, send one
>> request per minute (or second), configure 1 or 2 or 3 min idle instances
>> and check what is happening. You will see that new  instances are started
>> although resistant instances are available.
>>
>> Please take it serious and let somebody of the engineers check this!
>>
>> Cheers
>> Mos
>>
>>
>> On Fri, Aug 24, 2012 at 8:43 PM, Takashi Matsuo <[email protected]>wrote:
>>
>>>
>>> Hi Mos,
>>>
>>> On Sat, Aug 25, 2012 at 1:39 AM, Mos <[email protected]> wrote:
>>>
>>>> Hello Takashi,
>>>>
>>>>
>>>> > Actually there were almost 8 requests in a second. So App Engine
>>>> likely needed more than one instance at this particular moment.
>>>>
>>>> I thought this is why GAE has the concept of "pending-latency"  (which
>>>> we discussed below).
>>>> Meaning:  Incoming requests may wait up to 15 seconds before starting a
>>>> new instance. Therefore when 8 requests in one second occur that
>>>> should not mean that more instance needs to be started. Especially if
>>>> there is no other traffic in this minute, as seen in my example.
>>>> Otherwise it would be a very bad implementation:
>>>> Starting a new instance means around 30s waiting time.  Serving 8
>>>> parallel requests from one instance, would result in a maximum of
>>>> 8 seconds for the last request (assuming that each request takes around
>>>> 1 second).
>>>> There is no reason for this concrete example to fire up more instances
>>>> and let requests wait more then 30 seconds until a new instance is loaded.
>>>>
>>>
>>> Do you really read my e-mail?
>>>
>>> Setting Max Pending Latency doesn't force requests to be in the pending
>>> queue for the specified time. Please use Min Pending Latency instead.
>>> Can you try this first? If it doesn't work, try 2 min idle instances
>>> then.
>>>
>>>
>>>>
>>>> > ... here is what you've seen in the past weeks.
>>>> >
>>>> >* You have been almost always set 'Automatic-2' idle instance setting.
>>>> >* More than 3 weeks ago, number of loading requests were very few.
>>>> > * Recently you have seen more loading requests than before.
>>>>
>>>> That, right!  To be even more concrete: At the 16. august the problems
>>>> got significant worse. Please check especially the time area from 16.
>>>> august until today.
>>>>
>>>> > First of all, it seems that you deployed 2 new versions on Aug 1 and
>>>> Aug 2. Can you describe what kind of changes in those versions?
>>>>
>>>> I checked it in our version control. As I wrote no related changes were
>>>> made! Just Html/Css stuff:
>>>>  * One picture upload
>>>>  * One html change
>>>>  * One JavaScript change
>>>>  * One css change
>>>>
>>>>
>>>> > And, to be fair, we didn't think of any change in our scheduler
>>>> around 3 weeks ago which can cause this issue.
>>>>
>>>> And around the 16th august?
>>>
>>>
>>> Sigh... isn't it a waist of time? What is the reason you picked that
>>> date?
>>>
>>>
>>>>
>>>
>>>
>>>> > More than 3 weeks before, those 2 idle instances might have had
>>>> longer lives than now, but it was not a concrete behavior. Please think
>>>> this way: you were just kind of lucky.
>>>>
>>>> That shouldn't be luck! If GAE is not able to start Java instances in
>>>> 5sec to 10 second, there needs be a guarantee that instances have longer
>>>> lives.  Otherwise Java applications on GAE are unusable because user would
>>>> have a lot of 30seconds wait time  (--> "failed requests"). (See also next
>>>> comment regarding resistant instances)
>>>>
>>>>
>>>> > If you want some instances always active, please set min idle
>>>> instances.
>>>>
>>>> I tried this some days ago. I had one resistant instance. But that
>>>> changed nothing.  Instances get started and stopped as before. I assumed
>>>> that requests would go to the resistant instance first. But that was no the
>>>> case. Resistant instance was idle, but a dynamic instance got started and
>>>> the request waits 30sec.
>>>
>>> Please check other discussion on this list and issues that reported
>>>> similar observations.
>>>>
>>>
>>> So I'd say please try 2. If you still saw the user-facing loading
>>> requests, you need more resident instance to eliminate the user-facing
>>> loading requests.
>>>
>>>
>>>>
>>>> > As you can see, I'm still not convinced to believe that the scheduler
>>>> is misbehaving. I understand that you're having experiences which are bit
>>>> worse than 3 weeks ago, and understand your feeling that you want to tell
>>>> us 'fix it', but I'd say it's > >still something in the line of 'expected
>>>> behavior' at least for now.
>>>> > If you feel differently, please let me know.
>>>>
>>>> Yes I do feel differently (please see answers above).
>>>>
>>>> Please accept
>>>> http://code.google.com/p/googleappengine/issues/detail?id=8004
>>>>
>>>
>>> So what is your expected behavior and actual result? Nobody in our
>>> team can do anything if you just keep saying "the setting that used to work
>>> doesn't work anymore" without trying mu suggestion.
>>>
>>> I think my answer is clear at least for some points. 1) You'd better use
>>> 'min pending latency' instead of 'max pending latency' to prevent new
>>> instances to spin up as much as possible. 2) If you need longer instance
>>> lives, set appropriate number of min idle instances.
>>>
>>> -- Takashi
>>>
>>>
>>>>
>>>>
>>>> Thanks
>>>> Mos
>>>> http://www.mosbase.com
>>>>
>>>>
>>>> On Fri, Aug 24, 2012 at 4:22 PM, Takashi Matsuo <[email protected]>wrote:
>>>>
>>>>>
>>>>> Hi Mos,
>>>>>
>>>>> On Fri, Aug 24, 2012 at 6:05 PM, Mos <[email protected]> wrote:
>>>>>
>>>>>> > A possible explanation could be that the traffic pattern had
>>>>>> changed.
>>>>>>
>>>>>> No. It's the same. Check for example the Request/Seconds statistics
>>>>>> of my application for the last 30 days!
>>>>>
>>>>>
>>>>>> >> It's very obvious that one instance should be enough for my
>>>>>> application. And that was almost the case the last months!
>>>>>> > Actually it's not true. In particular, check this log:
>>>>>>
>>>>>> That's one expection where one client did 8 request in a minute  (+
>>>>>> one pingdom). Nothing else this minute.
>>>>>> In those exceptional cases it could be ok if a second instance
>>>>>> starts. (Nevertheless can't one instance not
>>>>>> handle 8 requests a  minute?)
>>>>>>
>>>>>
>>>>> The issue here is not 8 requests in a minute. Actually there were
>>>>> almost 8 requests in a second. So App Engine likely needed more than one
>>>>> instance at this particular moment. Anyway, as you say, probably it's just
>>>>> a reason for one of the loading requests you're seeing, and this is not
>>>>> very important thing in this topic.
>>>>>
>>>>> It's kind of digressing, but at a first glance, the Requests/Seconds
>>>>> stat seems an appropriate data source to discuss how many instances are
>>>>> actually needed, but in fact, it's not. The real traffic is not spreading
>>>>> equally.
>>>>>
>>>>>
>>>>>>
>>>>>> As I described:  Instances are started and stopped without reason,
>>>>>> even if less traffic per minute is available!
>>>>>
>>>>>
>>>>> Okay. As far as I understand, here is what you've seen in the past
>>>>> weeks.
>>>>>
>>>>> * You have been almost always set 'Automatic-2' idle instance setting.
>>>>> * More than 3 weeks ago, number of loading requests were very few.
>>>>> * Recently you have seen more loading requests than before.
>>>>>
>>>>> First of all, it seems that you deployed 2 new versions on Aug 1 and
>>>>> Aug 2. Can you describe what kind of changes in those versions?
>>>>>  I'd like to make sure that there is no changes that can cause the
>>>>> scheduler/app server behaving differently.
>>>>>
>>>>> Especially, if you want me to escalate this issue to our engineering
>>>>> team, you should provide the exact information. You say 'My application is
>>>>> unchanged', but in fact you deployed the new version on that day when you
>>>>> described the issue started. I need to make sure that there is no big
>>>>> change which can cause something bad.
>>>>>
>>>>> And, to be fair, we didn't think of any change in our scheduler around
>>>>> 3 weeks ago which can cause this issue.
>>>>>
>>>>> Secondly, you're setting max idle instances = 2. It does not guarantee
>>>>> that you have always 2 instances. It just guarantees that we will never
>>>>> charge you for more than 2 idle instances at any time.
>>>>>
>>>>> More than 3 weeks before, those 2 idle instances might have had longer
>>>>> lives than now, but it was not a concrete behavior. Please think this way:
>>>>> you were just kind of lucky. Now, presumably one or two of those instances
>>>>> are occasionally killed for some reasons(there should be certain 
>>>>> legitimate
>>>>> reasons, but those are something you don't need to care).
>>>>>
>>>>> If you want some instances always active, please set min idle
>>>>> instances. Certainly it will cost you a bit more, and you will loose the
>>>>> pending queue, but considering the access pattern of your app(no bursty
>>>>> traffic except for few access from the iPhone browser), I would recommend
>>>>> trying this setting in order to achieve what you want here. I'd recommend 
>>>>> 2
>>>>> idle instances in this case, but you should decide the number.
>>>>>
>>>>>
>>>>>> > * What is the purpose of max-pending-latency = 14.9 setting?
>>>>>>
>>>>>> " is high App Engine will allow requests to wait rather than start
>>>>>> new Instances to process them"
>>>>>> --> One attempt to stop GAE to create unnecessary instances.
>>>>>>
>>>>>
>>>>> I think you should set min pending latency instead of max pending
>>>>> latency if you want to prevent new instance to spin up. However, if you're
>>>>> going to set min idle instances, this setting will almost loose effect. If
>>>>> you don't want to set any min idle instances for whatever reason, please
>>>>> consider setting min pending latency instead of max pending latency.
>>>>>
>>>>>
>>>>>>
>>>>>> > * Can you try automatic-automatic for idle instances setting?
>>>>>>
>>>>>> I played around with this the last days and nothing changed. As I
>>>>>> wrote:  I had those configuration for months and it worked fine 3-4 weeks
>>>>>> ago!
>>>>>>
>>>>>
>>>>>> > * What is the purpose of those pingdom check? What happens if you
>>>>>> stop that?
>>>>>>
>>>>>> To be alerted if GAE is down a again. "What happens if you stop
>>>>>> that?" --> I wouldn't be angry anymore because I wouldn't recognize
>>>>>> downtime's of my GAE application. ;)
>>>>>>
>>>>>
>>>>>> Please forward
>>>>>> http://code.google.com/p/googleappengine/issues/detail?id=8004  to
>>>>>> the relevant GAE deparment.
>>>>>>
>>>>>
>>>>> As you can see, I'm still not convinced to believe that the scheduler
>>>>> is misbehaving. I understand that you're having experiences which are bit
>>>>> worse than 3 weeks ago, and understand your feeling that you want to tell
>>>>> us 'fix it', but I'd say it's still something in the line of 'expected
>>>>> behavior' at least for now.
>>>>>
>>>>> If you feel differently, please let me know.
>>>>>
>>>>> Regards,
>>>>>
>>>>> -- Takashi
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 24, 2012 at 1:39 AM, Takashi Matsuo 
>>>>>> <[email protected]>wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi Mos,
>>>>>>>
>>>>>>> On Thu, Aug 23, 2012 at 4:58 AM, Mos <[email protected]> wrote:
>>>>>>>
>>>>>>>> Does anybody else experience abnormal behavior of the
>>>>>>>> instance-scheduler the last three weeks (the last 7 days it got even
>>>>>>>> worse)?  (Java / HRD)
>>>>>>>> Or does anybody has profound knowledge about it?
>>>>>>>>
>>>>>>>> Background:  My application is unchanged for weeks, configuration
>>>>>>>> not changed and application's traffic is constant.
>>>>>>>> Traffic: One request per minute from Pingdom and around 200
>>>>>>>> additional pageviews the day (== around 1500 pageviews the day). The 
>>>>>>>> peek
>>>>>>>> is not more then 3-4 request per minute.
>>>>>>>>
>>>>>>>
>>>>>>> A possible explanation could be that the traffic pattern had changed.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> It's very obvious that one instance should be enough for my
>>>>>>>> application. And that was almost the case the last months!
>>>>>>>>
>>>>>>>
>>>>>>> Actually it's not true. In particular, check this log:
>>>>>>>
>>>>>>> https://appengine.google.com/logs?app_id=s~krisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search<https://appengine.google.com/logs?app_id=s%7Ekrisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search>
>>>>>>>
>>>>>>> You can see the iPhone client repeatedly requests your dynamic
>>>>>>> resources in a very short amount of time. Presumably it's due to some 
>>>>>>> kind
>>>>>>> of 'prefetch' feature of that device. Are you aware of those accesses, 
>>>>>>> and
>>>>>>> that this access pattern can cause a new instance starting?
>>>>>>>
>>>>>>> I don't think this is the only reason, but this can explain that
>>>>>>> some portion of your loading requests are expected behavior.
>>>>>>>
>>>>>>> Now I'd like to ask you some questions.
>>>>>>>
>>>>>>>
>>>>>>> * What is the purpose of max-pending-latency = 14.9 setting?
>>>>>>> * Can you try automatic-automatic for idle instances setting?
>>>>>>> * What is the purpose of those pingdom check? What happens if you
>>>>>>> stop that?
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> But now GAE creates most of the time 3 instances, whereby on has a
>>>>>>>> long life-time for days and the other ones are restarted around
>>>>>>>> 10 to 30 times the day.
>>>>>>>> Because load request takes between 30s to 40s  and requests are
>>>>>>>> waiting for loading instances, there are many request that
>>>>>>>> fail  (Users and Pingdom agree: *A request that takes more then a
>>>>>>>> couple of seconds is a failed request!*)
>>>>>>>>
>>>>>>>> Please check the attached screenshots that show the behavior!
>>>>>>>>
>>>>>>>> Note:
>>>>>>>> - Killing instances manually did not help
>>>>>>>> - Idle Instances were ( Automatic – 2 ) .  Changing it to whatever
>>>>>>>> didn't not change anything; e.g. like ( Automatic – 4 )
>>>>>>>>
>>>>>>>> Thanks and Cheers
>>>>>>>>
>>>>>>>> Mos
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "Google App Engine" group.
>>>>>>>> To post to this group, send email to
>>>>>>>> [email protected].
>>>>>>>> To unsubscribe from this group, send email to
>>>>>>>> [email protected].
>>>>>>>> For more options, visit this group at
>>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Takashi Matsuo | Developers Advocate | [email protected]
>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "Google App Engine" group.
>>>>>>> To post to this group, send email to
>>>>>>> [email protected].
>>>>>>> To unsubscribe from this group, send email to
>>>>>>> [email protected].
>>>>>>> For more options, visit this group at
>>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>>
>>>>>>
>>>>>>  --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Google App Engine" group.
>>>>>> To post to this group, send email to
>>>>>> [email protected].
>>>>>> To unsubscribe from this group, send email to
>>>>>> [email protected].
>>>>>> For more options, visit this group at
>>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Takashi Matsuo | Developers Advocate | [email protected]
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "Google App Engine" group.
>>>>> To post to this group, send email to [email protected]
>>>>> .
>>>>> To unsubscribe from this group, send email to
>>>>> [email protected].
>>>>> For more options, visit this group at
>>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Google App Engine" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected].
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/google-appengine?hl=en.
>>>>
>>>
>>>
>>>
>>> --
>>> Takashi Matsuo | Developers Advocate | [email protected]
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "Google App Engine" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to
>>> [email protected].
>>> For more options, visit this group at
>>> http://groups.google.com/group/google-appengine?hl=en.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "Google App Engine" group.
>> To post to this group, send email to [email protected].
>> To unsubscribe from this group, send email to
>> [email protected].
>> For more options, visit this group at
>> http://groups.google.com/group/google-appengine?hl=en.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/google-appengine?hl=en.

Re: [google-appengine] Weird Instance Scheduler

Reply via email to