>From the [documentation](https://cloud.google.com/appengine/docs/flexible/python/how-requests-are-handled) on how GAE Flexible handles requests, it says that "An instance can handle multiple requests concurrently" but I don't know what this exactly means.
Let's say my application can process a single request every 60 seconds. After starting to process the initial request, will another request (or 3) that occur say 30 seconds after (so halfway done with the first request), be handled by the same instance, or will it trigger autoscaling and spin up more instances to handle those new requests? This situation assumes that CPU utilization for the first request is still below the scaling CPU-utilization threshold. I'm worried that because it takes my instance 60 seconds to process a single request and I will be receiving multiple requests at a time, that I'll be inefficiently triggering autoscaling even if there is enough processing power to handle additional requests on the same instance. Is this how it works? I would ideally like to be able to multi-thread my processing and accept additional requests on the same instance while still under the CPU utilization threshold. The documentation for concurrent requests is scarce for the Flexible environment unlike the Standard environment so I want to be sure. -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscr...@googlegroups.com. To post to this group, send email to google-appengine@googlegroups.com. Visit this group at https://groups.google.com/group/google-appengine. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/ff0754bd-cc66-431f-bc14-a2dc5da2e9d9%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.