Hey Vidya.

You are correct that the instance start time is greatly based on your code, 
as each time a new instance is created it must load and prepare a fresh 
copy of your code to serve.

As for the reason why you are seeing a single instance handling the bulk of 
your requests, this comes down to the App Engine scheduler as you have 
mentioned. The scheduler will simply ask the first instance if it can 
handle a request. Based on your scaling configuration for pending latency 
and concurrent requests, your first instance will tell the scheduler that 
it can handle an extra request, and so it does; leaving the rest of your 
instances waiting to handle any overflow. 

If App Engine thinks you may need an extra instance warmed up just in case 
of overflow, it will create one. This is why you see a single Dynamic 
instance at the bottom handling no requests. Again, App Engine sends 
requests to Dynamic instances and not idle Resident instance. If there is 
no available Dynamic instance, your Resident Instance will be treated as a 
Dynamic instance and a new Resident Instance will be kicked up to meet your 
configured 
<https://cloud.google.com/appengine/docs/java/config/appref#scaling_elements> 
minimum idle instances.  

To configure your scaling options 
<https://cloud.google.com/appengine/docs/java/config/appref#scaling_elements>to 
force requests to be more spread across available instances, simply reduce 
the amount of concurrent requests a single instance is allowed to handle, 
reduce the minimum pending latency a request is allowed to wait in an 
instance's pending queue for, and reduce the max pending latency to force a 
request to be handled by a new instance after a period of time. Note, I 
would not recommend setting any of these to zero forcing each request to be 
handled by a single instance. This is because you still want multiple 
requests to be handled by each instance, to balance cost and performance. 

Continue to use the Stackdriver Trace <https://cloud.google.com/trace/> 
tool to see the breakdown of latency for requests, and use this to 
configure the optimal scaling settings for your app so that requests are 
not waiting too long in a pending queue for other requests in front of it 
to finish. Ideally optimizing your code to execute requests very quickly in 
an asynchronous style (such as using the Task Queue to perform long image 
manipulation tasks instead of forcing a user to wait) will make your 
application scalable for Cloud computing. 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/231d24e7-ac1c-4eba-bfb3-8fada9677094%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
  • [google-appengine]... Shashikanth Reddy
    • [google-appen... 'Jordan (Cloud Platform Support)' via Google App Engine
      • [google-a... Vidya Narayanan
        • [goog... 'Jordan (Cloud Platform Support)' via Google App Engine
          • R... Vidya
            • ... Jeff Schnitzer
              • ... Nick
                • ... Vidya
                • ... 'Jordan (Cloud Platform Support)' via Google App Engine
            • ... 'Jordan (Cloud Platform Support)' via Google App Engine

Reply via email to