>From the 
[documentation](https://cloud.google.com/appengine/docs/flexible/python/how-requests-are-handled)
 
on how GAE Flexible handles requests, it says that "An instance can handle 
multiple requests concurrently" but I don't know what this exactly means. 

Let's say my application can process a single request every 60 seconds. 

After starting to process the initial request, will another request (or 3) 
that occur say 30 seconds after (so halfway done with the first request), 
be handled by the same instance, or will it trigger autoscaling and spin up 
more instances to handle those new requests? This situation assumes that 
CPU utilization for the first request is still below the scaling 
CPU-utilization threshold. 

I'm worried that because it takes my instance 60 seconds to process a 
single request and I will be receiving multiple requests at a time, that 
I'll be inefficiently triggering autoscaling even if there is enough 
processing power to handle additional requests on the same instance. Is 
this how it works? I would ideally like to be able to multi-thread my 
processing and accept additional requests on the same instance while still 
under the CPU utilization threshold. 

The documentation for concurrent requests is scarce for the Flexible 
environment unlike the Standard environment so I want to be sure. 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/ff0754bd-cc66-431f-bc14-a2dc5da2e9d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to