To whomever at Google with knowledge of this error message,
We are using AppEngine Standard with a mix of Java7/8 (on Tuesday our 
production environment will all be on Java8). We are using *basic scaling, 
*generally 
with a max of ~20 instances, and we rarely see services scale above 5 
instances. We frequently get this error in our logs:

The process handling this request unexpectedly died. This is likely to 
> cause a new process to be used for the next request to your application. 
> (Error code 203)


We used to think this was due to an OOM (perhaps an OOM is one possible 
root cause), however we've been seeing this error more and more frequently, 
and checking in the Cloud Console, the services were at low memory usage 
when it happened, and there was no spike in memory usage or other anomaly. 
So it's fairly safe to assume that this error has multiple root causes 
(perhaps any java *Error* ?) 

This seems to be due to high traffic, but our services getting this error 
haven't nearly scaled to the maximum allowed configured instances, as 
mentioned above.

So my question to you is:

   - What are all the root causes of this issue?
   - How can we troubleshoot this issue?
   - FYI, I don't believe this is due to performance; we've done a lot of 
   work on performance and generally requests are under 500ms for all 
   endpoints, except for a few endpoints that may take up to 10s when under 
   load. When we see this error, the request time was often under 100ms. We 
   haven't been seeing any 60s timeouts.


*Some troubleshooting info*
This is happening mostly for 2 of our microservices, and in deferred tasks 
of a third microservice. The one that sees this error the most seems to be 
scaling up and down the number of "active" instances fairly frequently. I'm 
not sure how "active" is determined, other than the obvious of whether 
there is traffic to the instance.

<https://lh3.googleusercontent.com/-ceT40LYevCo/W2ZyNHz-wUI/AAAAAAAABuA/2N0agMUQSmwSy7nAeai6bupy1KDdEL_yACLcBGAs/s1600/service_num_instances.png>

This happens for a wide variety of requests; requests from our app to our 
mobile proxy service, requests between services, and deferred tasks. This 
happens for some slow requests, for some fast requests, for background 
threads, etc. So it's quite difficult to pinpoint the cause. We can do some 
blanket work to try to "generally improve performance" but that's a rather 
inefficient way to solve this problem. I appreciate any help on this matter 
and I may also create a support ticket, but the ticket system often doesn't 
provide very useful info. 

Here are some example requests that all had this error:

<https://lh3.googleusercontent.com/-fq5O1vWOKQ0/W2Z0WT_knqI/AAAAAAAABuM/dfjLtevtLhU-9o9iuTzlzzFU8-_P7eE2wCLcBGAs/s1600/logs_with_process_death.png>
 

Thanks for any help!

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/64a4983b-0695-4c22-9f9a-461f49cc64af%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to