Thanks Kris for describing your case. That's what I saw in my experiments also. The "min instance setting" is not the solution because it doesn't work as expected. I hope someone from GAE's team takes it serious and elaborate on this.
On Fri, Aug 24, 2012 at 9:49 PM, Kristopher Giesing <[email protected]>wrote: > Hi Takashi, > > I ran some experiments with an instance that had requests pending only > from my own scripts (no user facing traffic at all). > > What I found was that sending requests at about 1req/sec, regularly > spaced, caused GAE to spin up new instances randomly. If I set the min > instances setting to anything but "automatic", the very first request would > cause a new instance to spin up (this was true even if the min instances > was some high number, like 8, and I waited for all 8 instances to finish > launching before sending a request - so in this case the # of instances > started at 9 for the very first request). > > The only solution I found for this behavior was to package the entire app > as a backend. > > - Kris > > > On Friday, August 24, 2012 11:43:23 AM UTC-7, Takashi Matsuo (Google) > wrote: > >> >> Hi Mos, >> >> On Sat, Aug 25, 2012 at 1:39 AM, Mos <[email protected]> wrote: >> >>> Hello Takashi, >>> >>> >>> > Actually there were almost 8 requests in a second. So App Engine >>> likely needed more than one instance at this particular moment. >>> >>> I thought this is why GAE has the concept of "pending-latency" (which >>> we discussed below). >>> Meaning: Incoming requests may wait up to 15 seconds before starting a >>> new instance. Therefore when 8 requests in one second occur that >>> should not mean that more instance needs to be started. Especially if >>> there is no other traffic in this minute, as seen in my example. >>> Otherwise it would be a very bad implementation: >>> Starting a new instance means around 30s waiting time. Serving 8 >>> parallel requests from one instance, would result in a maximum of >>> 8 seconds for the last request (assuming that each request takes around >>> 1 second). >>> There is no reason for this concrete example to fire up more instances >>> and let requests wait more then 30 seconds until a new instance is loaded. >>> >> >> Do you really read my e-mail? >> >> Setting Max Pending Latency doesn't force requests to be in the pending >> queue for the specified time. Please use Min Pending Latency instead. >> Can you try this first? If it doesn't work, try 2 min idle instances then. >> >> >>> >>> > ... here is what you've seen in the past weeks. >>> > >>> >* You have been almost always set 'Automatic-2' idle instance setting. >>> >* More than 3 weeks ago, number of loading requests were very few. >>> > * Recently you have seen more loading requests than before. >>> >>> That, right! To be even more concrete: At the 16. august the problems >>> got significant worse. Please check especially the time area from 16. >>> august until today. >>> >>> > First of all, it seems that you deployed 2 new versions on Aug 1 and >>> Aug 2. Can you describe what kind of changes in those versions? >>> >>> I checked it in our version control. As I wrote no related changes were >>> made! Just Html/Css stuff: >>> * One picture upload >>> * One html change >>> * One JavaScript change >>> * One css change >>> >>> >>> > And, to be fair, we didn't think of any change in our scheduler around >>> 3 weeks ago which can cause this issue. >>> >>> And around the 16th august? >> >> >> Sigh... isn't it a waist of time? What is the reason you picked that >> date? >> >> >>> >> >> >>> > More than 3 weeks before, those 2 idle instances might have had longer >>> lives than now, but it was not a concrete behavior. Please think this way: >>> you were just kind of lucky. >>> >>> That shouldn't be luck! If GAE is not able to start Java instances in >>> 5sec to 10 second, there needs be a guarantee that instances have longer >>> lives. Otherwise Java applications on GAE are unusable because user would >>> have a lot of 30seconds wait time (--> "failed requests"). (See also next >>> comment regarding resistant instances) >>> >>> >>> > If you want some instances always active, please set min idle >>> instances. >>> >>> I tried this some days ago. I had one resistant instance. But that >>> changed nothing. Instances get started and stopped as before. I assumed >>> that requests would go to the resistant instance first. But that was no the >>> case. Resistant instance was idle, but a dynamic instance got started and >>> the request waits 30sec. >> >> Please check other discussion on this list and issues that reported >>> similar observations. >>> >> >> So I'd say please try 2. If you still saw the user-facing loading >> requests, you need more resident instance to eliminate the user-facing >> loading requests. >> >> >>> >>> > As you can see, I'm still not convinced to believe that the scheduler >>> is misbehaving. I understand that you're having experiences which are bit >>> worse than 3 weeks ago, and understand your feeling that you want to tell >>> us 'fix it', but I'd say it's > >still something in the line of 'expected >>> behavior' at least for now. >>> > If you feel differently, please let me know. >>> >>> Yes I do feel differently (please see answers above). >>> >>> Please accept http://code.google.com/p/**googleappengine/issues/detail?* >>> *id=8004<http://code.google.com/p/googleappengine/issues/detail?id=8004> >>> >> >> So what is your expected behavior and actual result? Nobody in our >> team can do anything if you just keep saying "the setting that used to work >> doesn't work anymore" without trying mu suggestion. >> >> I think my answer is clear at least for some points. 1) You'd better use >> 'min pending latency' instead of 'max pending latency' to prevent new >> instances to spin up as much as possible. 2) If you need longer instance >> lives, set appropriate number of min idle instances. >> >> -- Takashi >> >> >>> >>> >>> Thanks >>> Mos >>> http://www.mosbase.com >>> >>> >>> On Fri, Aug 24, 2012 at 4:22 PM, Takashi Matsuo <[email protected]>wrote: >>> >>>> >>>> Hi Mos, >>>> >>>> On Fri, Aug 24, 2012 at 6:05 PM, Mos <[email protected]> wrote: >>>> >>>>> > A possible explanation could be that the traffic pattern had changed. >>>>> >>>>> No. It's the same. Check for example the Request/Seconds statistics of >>>>> my application for the last 30 days! >>>> >>>> >>>>> >> It's very obvious that one instance should be enough for my >>>>> application. And that was almost the case the last months! >>>>> > Actually it's not true. In particular, check this log: >>>>> >>>>> That's one expection where one client did 8 request in a minute (+ >>>>> one pingdom). Nothing else this minute. >>>>> In those exceptional cases it could be ok if a second instance starts. >>>>> (Nevertheless can't one instance not >>>>> handle 8 requests a minute?) >>>>> >>>> >>>> The issue here is not 8 requests in a minute. Actually there were >>>> almost 8 requests in a second. So App Engine likely needed more than one >>>> instance at this particular moment. Anyway, as you say, probably it's just >>>> a reason for one of the loading requests you're seeing, and this is not >>>> very important thing in this topic. >>>> >>>> It's kind of digressing, but at a first glance, the Requests/Seconds >>>> stat seems an appropriate data source to discuss how many instances are >>>> actually needed, but in fact, it's not. The real traffic is not spreading >>>> equally. >>>> >>>> >>>>> >>>>> As I described: Instances are started and stopped without reason, >>>>> even if less traffic per minute is available! >>>> >>>> >>>> Okay. As far as I understand, here is what you've seen in the past >>>> weeks. >>>> >>>> * You have been almost always set 'Automatic-2' idle instance setting. >>>> * More than 3 weeks ago, number of loading requests were very few. >>>> * Recently you have seen more loading requests than before. >>>> >>>> First of all, it seems that you deployed 2 new versions on Aug 1 and >>>> Aug 2. Can you describe what kind of changes in those versions? >>>> I'd like to make sure that there is no changes that can cause the >>>> scheduler/app server behaving differently. >>>> >>>> Especially, if you want me to escalate this issue to our engineering >>>> team, you should provide the exact information. You say 'My application is >>>> unchanged', but in fact you deployed the new version on that day when you >>>> described the issue started. I need to make sure that there is no big >>>> change which can cause something bad. >>>> >>>> And, to be fair, we didn't think of any change in our scheduler around >>>> 3 weeks ago which can cause this issue. >>>> >>>> Secondly, you're setting max idle instances = 2. It does not guarantee >>>> that you have always 2 instances. It just guarantees that we will never >>>> charge you for more than 2 idle instances at any time. >>>> >>>> More than 3 weeks before, those 2 idle instances might have had longer >>>> lives than now, but it was not a concrete behavior. Please think this way: >>>> you were just kind of lucky. Now, presumably one or two of those instances >>>> are occasionally killed for some reasons(there should be certain legitimate >>>> reasons, but those are something you don't need to care). >>>> >>>> If you want some instances always active, please set min idle >>>> instances. Certainly it will cost you a bit more, and you will loose the >>>> pending queue, but considering the access pattern of your app(no bursty >>>> traffic except for few access from the iPhone browser), I would recommend >>>> trying this setting in order to achieve what you want here. I'd recommend 2 >>>> idle instances in this case, but you should decide the number. >>>> >>>> >>>>> > * What is the purpose of max-pending-latency = 14.9 setting? >>>>> >>>>> " is high App Engine will allow requests to wait rather than start new >>>>> Instances to process them" >>>>> --> One attempt to stop GAE to create unnecessary instances. >>>>> >>>> >>>> I think you should set min pending latency instead of max pending >>>> latency if you want to prevent new instance to spin up. However, if you're >>>> going to set min idle instances, this setting will almost loose effect. If >>>> you don't want to set any min idle instances for whatever reason, please >>>> consider setting min pending latency instead of max pending latency. >>>> >>>> >>>>> >>>>> > * Can you try automatic-automatic for idle instances setting? >>>>> >>>>> I played around with this the last days and nothing changed. As I >>>>> wrote: I had those configuration for months and it worked fine 3-4 weeks >>>>> ago! >>>>> >>>> >>>>> > * What is the purpose of those pingdom check? What happens if you >>>>> stop that? >>>>> >>>>> To be alerted if GAE is down a again. "What happens if you stop that?" >>>>> --> I wouldn't be angry anymore because I wouldn't recognize downtime's of >>>>> my GAE application. ;) >>>>> >>>> >>>>> Please forward http://code.google.com/p/** >>>>> googleappengine/issues/detail?**id=8004<http://code.google.com/p/googleappengine/issues/detail?id=8004> >>>>> to the relevant GAE deparment. >>>>> >>>> >>>> As you can see, I'm still not convinced to believe that the scheduler >>>> is misbehaving. I understand that you're having experiences which are bit >>>> worse than 3 weeks ago, and understand your feeling that you want to tell >>>> us 'fix it', but I'd say it's still something in the line of 'expected >>>> behavior' at least for now. >>>> >>>> If you feel differently, please let me know. >>>> >>>> Regards, >>>> >>>> -- Takashi >>>> >>>> >>>>> >>>>> Thanks! >>>>> >>>>> >>>>> On Fri, Aug 24, 2012 at 1:39 AM, Takashi Matsuo <[email protected]>wrote: >>>>> >>>>>> >>>>>> Hi Mos, >>>>>> >>>>>> On Thu, Aug 23, 2012 at 4:58 AM, Mos <[email protected]> wrote: >>>>>> >>>>>>> Does anybody else experience abnormal behavior of the >>>>>>> instance-scheduler the last three weeks (the last 7 days it got even >>>>>>> worse)? (Java / HRD) >>>>>>> Or does anybody has profound knowledge about it? >>>>>>> >>>>>>> Background: My application is unchanged for weeks, configuration >>>>>>> not changed and application's traffic is constant. >>>>>>> Traffic: One request per minute from Pingdom and around 200 >>>>>>> additional pageviews the day (== around 1500 pageviews the day). The >>>>>>> peek >>>>>>> is not more then 3-4 request per minute. >>>>>>> >>>>>> >>>>>> A possible explanation could be that the traffic pattern had changed. >>>>>> >>>>>> >>>>>>> >>>>>>> It's very obvious that one instance should be enough for my >>>>>>> application. And that was almost the case the last months! >>>>>>> >>>>>> >>>>>> Actually it's not true. In particular, check this log: >>>>>> https://appengine.google.com/**logs?app_id=s~krisen-talk&** >>>>>> version_id=1-0.**360912144269287698&severity_** >>>>>> level_override=1&severity_**level=3&tz=Europe%2FBerlin&** >>>>>> filter=&filter_type=regex&**date_type=datetime&date=2012-** >>>>>> 08-23&time=23%3A57%3A00&limit=**20&view=Search<https://appengine.google.com/logs?app_id=s%7Ekrisen-talk&version_id=1-0.360912144269287698&severity_level_override=1&severity_level=3&tz=Europe%2FBerlin&filter=&filter_type=regex&date_type=datetime&date=2012-08-23&time=23%3A57%3A00&limit=20&view=Search> >>>>>> >>>>>> You can see the iPhone client repeatedly requests your dynamic >>>>>> resources in a very short amount of time. Presumably it's due to some >>>>>> kind >>>>>> of 'prefetch' feature of that device. Are you aware of those accesses, >>>>>> and >>>>>> that this access pattern can cause a new instance starting? >>>>>> >>>>>> I don't think this is the only reason, but this can explain that some >>>>>> portion of your loading requests are expected behavior. >>>>>> >>>>>> Now I'd like to ask you some questions. >>>>>> >>>>>> >>>>>> * What is the purpose of max-pending-latency = 14.9 setting? >>>>>> * Can you try automatic-automatic for idle instances setting? >>>>>> * What is the purpose of those pingdom check? What happens if you >>>>>> stop that? >>>>>> >>>>>> >>>>>>> >>>>>>> But now GAE creates most of the time 3 instances, whereby on has a >>>>>>> long life-time for days and the other ones are restarted around >>>>>>> 10 to 30 times the day. >>>>>>> Because load request takes between 30s to 40s and requests are >>>>>>> waiting for loading instances, there are many request that >>>>>>> fail (Users and Pingdom agree: *A request that takes more then a >>>>>>> couple of seconds is a failed request!*) >>>>>>> >>>>>>> Please check the attached screenshots that show the behavior! >>>>>>> >>>>>>> Note: >>>>>>> - Killing instances manually did not help >>>>>>> - Idle Instances were ( Automatic – 2 ) . Changing it to whatever >>>>>>> didn't not change anything; e.g. like ( Automatic – 4 ) >>>>>>> >>>>>>> Thanks and Cheers >>>>>>> >>>>>>> Mos >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "Google App Engine" group. >>>>>>> To post to this group, send email to google-a...@googlegroups.**com. >>>>>>> To unsubscribe from this group, send email to google-appengi...@** >>>>>>> googlegroups.com. >>>>>>> >>>>>>> For more options, visit this group at http://groups.google.com/** >>>>>>> group/google-appengine?hl=en<http://groups.google.com/group/google-appengine?hl=en> >>>>>>> . >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Takashi Matsuo | Developers Advocate | [email protected] >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "Google App Engine" group. >>>>>> To post to this group, send email to google-a...@googlegroups.**com. >>>>>> To unsubscribe from this group, send email to google-appengi...@** >>>>>> googlegroups.com. >>>>>> >>>>>> For more options, visit this group at http://groups.google.com/** >>>>>> group/google-appengine?hl=en<http://groups.google.com/group/google-appengine?hl=en> >>>>>> . >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "Google App Engine" group. >>>>> To post to this group, send email to google-a...@googlegroups.**com. >>>>> To unsubscribe from this group, send email to google-appengi...@** >>>>> googlegroups.com. >>>>> >>>>> For more options, visit this group at http://groups.google.com/** >>>>> group/google-appengine?hl=en<http://groups.google.com/group/google-appengine?hl=en> >>>>> . >>>>> >>>> >>>> >>>> >>>> -- >>>> Takashi Matsuo | Developers Advocate | [email protected] >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Google App Engine" group. >>>> To post to this group, send email to google-a...@googlegroups.**com. >>>> To unsubscribe from this group, send email to google-appengi...@** >>>> googlegroups.com. >>>> >>>> For more options, visit this group at http://groups.google.com/** >>>> group/google-appengine?hl=en<http://groups.google.com/group/google-appengine?hl=en> >>>> . >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Google App Engine" group. >>> To post to this group, send email to google-a...@googlegroups.**com. >>> To unsubscribe from this group, send email to google-appengi...@** >>> googlegroups.com. >>> >>> For more options, visit this group at http://groups.google.com/** >>> group/google-appengine?hl=en<http://groups.google.com/group/google-appengine?hl=en> >>> . >>> >> >> >> >> -- >> Takashi Matsuo | Developers Advocate | [email protected] >> >> -- > You received this message because you are subscribed to the Google Groups > "Google App Engine" group. > To view this discussion on the web visit > https://groups.google.com/d/msg/google-appengine/-/YIzxpRbmyHMJ. > > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group at > http://groups.google.com/group/google-appengine?hl=en. > -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
