Note that you are only using "manual_scaling: instances: 1". Therefore you only have one instance to accept requests.
The web server (Nginx) in front of your application code accepts the request from the load balancer and attempts to route it to your proper service in your code. If your code is too busy to respond (meaning it is blocking on an older request) the Nginx proxy will timeout (after retires) and tell the Load Balancer "502 Bad Gateway". It is then up to the client to retry sending requests until your application code is free to accept new requests. - Therefore it is recommended to ensure that your application never blocks on a single request and is able to handle concurrent requests. As per the documentations [1], the default Python Gunicorn config only uses one worker which is only able to handle a single request (aka no concurrent requests). It is therefore recommended to increases the number of workers as explained in [1] and to use ASYNC workers to allow your single instance to accept concurrent requests. - Once your single instance is able to handle more requests it may then become over-worked (depending on your amount of incoming traffic) and bottleneck on the CPU. It is then recommended to either increase the instance resources and/or use more than one instance (normally automatic scaling is recommended for high traffic applications). [1] https://cloud.google.com/appengine/docs/flexible/python/runtime#recommended_gunicorn_configuration [2] http://docs.gunicorn.org/en/latest/design.html#async-workers <https://www.google.com/url?q=http://docs.gunicorn.org/en/latest/design.html%23async-workers&sa=D&usg=AFQjCNEUM9CiMaCR2fDxq_xUzZgrQyFBaA> -- You received this message because you are subscribed to the Google Groups "Google App Engine" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/28f55723-9b42-4b59-b6d8-64d31e5e7ba0%40googlegroups.com.
