[google-appengine] "The process handling this request has unexpectedly died... (Error Code 203)"

Charles Batty-Capps Sat, 04 Aug 2018 20:53:32 -0700

To whomever at Google with knowledge of this error message,
We are using AppEngine Standard with a mix of Java7/8 (on Tuesday our 
production environment will all be on Java8). We are using *basic scaling, 
*generally 
with a max of ~20 instances, and we rarely see services scale above 5 
instances. We frequently get this error in our logs:

The process handling this request unexpectedly died. This is likely to
> cause a new process to be used for the next request to your application.
> (Error code 203)

We used to think this was due to an OOM (perhaps an OOM is one possible
root cause), however we've been seeing this error more and more frequently,
and checking in the Cloud Console, the services were at low memory usage
when it happened, and there was no spike in memory usage or other anomaly.
So it's fairly safe to assume that this error has multiple root causes
(perhaps any java *Error* ?)

This seems to be due to high traffic, but our services getting this error
haven't nearly scaled to the maximum allowed configured instances, as
mentioned above.

So my question to you is:

- What are all the root causes of this issue?
- How can we troubleshoot this issue?
- FYI, I don't believe this is due to performance; we've done a lot of
work on performance and generally requests are under 500ms for all
endpoints, except for a few endpoints that may take up to 10s when under
load. When we see this error, the request time was often under 100ms. We
haven't been seeing any 60s timeouts.

*Some troubleshooting info*
This is happening mostly for 2 of our microservices, and in deferred tasks
of a third microservice. The one that sees this error the most seems to be
scaling up and down the number of "active" instances fairly frequently. I'm
not sure how "active" is determined, other than the obvious of whether
there is traffic to the instance.

<https://lh3.googleusercontent.com/-ceT40LYevCo/W2ZyNHz-wUI/AAAAAAAABuA/2N0agMUQSmwSy7nAeai6bupy1KDdEL_yACLcBGAs/s1600/service_num_instances.png>

This happens for a wide variety of requests; requests from our app to our
mobile proxy service, requests between services, and deferred tasks. This
happens for some slow requests, for some fast requests, for background
threads, etc. So it's quite difficult to pinpoint the cause. We can do some
blanket work to try to "generally improve performance" but that's a rather
inefficient way to solve this problem. I appreciate any help on this matter
and I may also create a support ticket, but the ticket system often doesn't
provide very useful info.

Here are some example requests that all had this error:

<https://lh3.googleusercontent.com/-fq5O1vWOKQ0/W2Z0WT_knqI/AAAAAAAABuM/dfjLtevtLhU-9o9iuTzlzzFU8-_P7eE2wCLcBGAs/s1600/logs_with_process_death.png>

Thanks for any help!

--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-appengine/64a4983b-0695-4c22-9f9a-461f49cc64af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[google-appengine] "The process handling this request has unexpectedly died... (Error Code 203)"

Reply via email to