Hey Vidya. You are correct that the instance start time is greatly based on your code, as each time a new instance is created it must load and prepare a fresh copy of your code to serve.
As for the reason why you are seeing a single instance handling the bulk of your requests, this comes down to the App Engine scheduler as you have mentioned. The scheduler will simply ask the first instance if it can handle a request. Based on your scaling configuration for pending latency and concurrent requests, your first instance will tell the scheduler that it can handle an extra request, and so it does; leaving the rest of your instances waiting to handle any overflow. If App Engine thinks you may need an extra instance warmed up just in case of overflow, it will create one. This is why you see a single Dynamic instance at the bottom handling no requests. Again, App Engine sends requests to Dynamic instances and not idle Resident instance. If there is no available Dynamic instance, your Resident Instance will be treated as a Dynamic instance and a new Resident Instance will be kicked up to meet your configured <https://cloud.google.com/appengine/docs/java/config/appref#scaling_elements> minimum idle instances. To configure your scaling options <https://cloud.google.com/appengine/docs/java/config/appref#scaling_elements>to force requests to be more spread across available instances, simply reduce the amount of concurrent requests a single instance is allowed to handle, reduce the minimum pending latency a request is allowed to wait in an instance's pending queue for, and reduce the max pending latency to force a request to be handled by a new instance after a period of time. Note, I would not recommend setting any of these to zero forcing each request to be handled by a single instance. This is because you still want multiple requests to be handled by each instance, to balance cost and performance. Continue to use the Stackdriver Trace <https://cloud.google.com/trace/> tool to see the breakdown of latency for requests, and use this to configure the optimal scaling settings for your app so that requests are not waiting too long in a pending queue for other requests in front of it to finish. Ideally optimizing your code to execute requests very quickly in an asynchronous style (such as using the Task Queue to perform long image manipulation tasks instead of forcing a user to wait) will make your application scalable for Cloud computing. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/231d24e7-ac1c-4eba-bfb3-8fada9677094%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
