Hey Vinay Chitlangia,
Thanks for some preliminary troubleshooting and linking this interesting
article. App Engine runs Nginx processes to handle routes to your
application's handlers. Handlers serving static assets for instance are
handled by this Nginx process and the resources are served directly, thus
bypassing the application altogether to save on precious application
resources.
The Nginx process will often serve a *502* if the application raises an
exception, an internal API call raises an exception or if the request
simply takes too long. As such, the status code by itself does not tell us
much.
Looking at the GAE logs for your application, I found the *502*s you
mentioned. One thing I noticed is that they all occur from the */read*
endpoint. From the naming, I assume this endpoint is reading some data
from BigTable. Investigating further, perhaps you could provide some
additional information:
- What exactly is happening at the */read* endpoint? A code sample
would be ideal if that's not too sensitive.
- What kind of error handling exists in said endpoint if the BigTable
API returns non-success responses?
- Can you log various steps in the */read* endpoint? This might help
identify the progress the request reaches before the *502* is served.
It would also help in confirming that your application is actually even
getting the request as I can't currently confirm that from the logs.
- If said endpoint does in fact read from BigTable, what API and java
library are you using?
Regarding the article you linked, while the configuration of an HTTPS load
balancer and nginx.conf can be very important, both the load balancing
component and nginx.conf are out of the hands of the developer with App
Engine. Your scaling settings, health check settings and handlers in the
app.yaml are the only rules over which you have control that affect load
balancing and nginx rules.
On Wednesday, February 8, 2017 at 11:27:43 AM UTC-5, Vinay Chitlangia wrote:
>
> Might be related:
>
> https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340#.6k2laoada
>
> The symptoms mentioned in this blog
> Somewhat moderate requests
> No logs
>
> match our observations.
>
> I do not see the "backend_connection_closed_before_data_sent_to_client"
> status in the logs.
>
> The error message for a failed request received by the client is:
> 11:12:44.549com.yawasa.server.storage.RpcStorageService LogError:
> <html><head><title>502 Bad Gateway</title></head><body
> bgcolor="white"><center><h1>502 Bad
> Gateway</h1></center><hr><center>nginx</center></body></html> (
> RpcStorageService.java:137
> <https://console.cloud.google.com/debug/fromlog?appModule=default&appVersion=1&file=RpcStorageService.java&line=137&logInsertId=589569d9000e7bf6825479e4&logNanos=1486186963359794000&nestedLogIndex=0&project=village-test>
> )
>
> The mention of nginx in the log message appears promising. We are not
> using nginx deliberately, so I am assuming this is something happening
> under the hood.
>
> On Tuesday, February 7, 2017 at 11:08:55 AM UTC+5:30, Vinay Chitlangia
> wrote:
>>
>> Hi,
>> We are seeing intermittent occurrences of 502 Bad Gateway error in our
>> server.
>> About 0.5% requests fail with this error.
>>
>> Out setup is:
>> Flex running jetty9-compat
>> F1 machine
>> 1 server
>>
>> Our request pattern is bursty. So the server gets ~30 requests in
>> parallel.
>> The failures, when they happen are clustered, that is over a period of
>> 10'ish seconds one would see 3-4 errors.
>>
>> The requests which complete successfully, finish in 50-100 ms, so it does
>> not appear like the server is under major load and not able to keep up.
>> To rule out this possibility, I started the servers with 5 replicas.
>> However the failure percentage did not change.
>>
>> From the looks of it, it appears that there is some throttling or quota
>> issue at play. I tried tweaking max-concurrent-requests param. Set it to
>> 300, but that did not make any difference either.
>>
>> I do not see new instances being created at the time of failure either.
>>
>>
>> The request log for the failed request:
>> 09:57:30.686POST502262 B4 msAppEngine-Google; (+
>> http://code.google.com/appengine; appid: s~village-test)/read
>> 107.178.194.3 - - [07/Feb/2017:09:57:30 +0530] "POST /read HTTP/1.1" 502
>> 262 - "AppEngine-Google; (+http://code.google.com/appengine; ms=4
>> cpu_ms=0 cpm_usd=2.9279999999999998e-8 loading_request=0 instance=-
>> app_engine_release=1.9.48 trace_id=-
>> {
>> protoPayload: {…}
>> insertId: "58994cb30002335cb47fd364"
>> httpRequest: {…}
>> resource: {…}
>> timestamp: "2017-02-07T04:27:30.686052Z"
>> labels: {…}
>>
>> operation: {…}
>> }
>>
>> Looking around at other logs at around the time of failure I see.
>> 09:57:30.000[error] 32#32: *35107 recv() failed (104: Connection reset by
>> peer) while reading response header from upstream, client: 169.254.160.2,
>> server: , request: "POST /read HTTP/1.1", upstream: "
>> http://172.17.0.4:8080/read", host: "bigtable-dev.appspot.com"
>> AFAICT this request never made it to our servlet.
>>
>
--
You received this message because you are subscribed to the Google Groups
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit
https://groups.google.com/d/msgid/google-appengine/ea48946b-fbd9-47af-a7b4-136493f0d583%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.