Thanks Andrei, much appreciated! Cheers, Branden
On Mon, Apr 22, 2013 at 4:20 AM, Andrei Belov <[email protected]> wrote: > Branden, > > On Apr 5, 2013, at 2:24 , Branden Visser <[email protected]> wrote: > >> Hello, I've found that when there are upstream servers unavailable in >> my upstream group, applying a little bit of load on the server (i.e., >> just myself browsing around quickly, 2-3 req/s max) results in the >> following errors even for upstream servers that are available and >> well: >> >> 2013/04/04 22:02:21 [error] 4211#0: *2898 writev() failed (134: >> Transport endpoint is not connected) while sending request to >> upstream, client: 184.94.54.70, server: , request: "GET /api/ui/skin >> HTTP/1.1", upstream: "http://10.112.5.119:2001/api/ui/skin", host: >> "mysite.org", referrer: "http://mysite.org/search" >> >> In this particular example, I have 4 upstreams, 3 servers are shut >> down (all except 10.112.5.119). If I comment out the 3 other upstream >> servers, I cannot reproduce this error. >> >> Running SmartOS (Joyent cloud) >> >> $ nginx -v >> nginx version: nginx/1.3.14 >> >> These are things I tried to no avail: >> >> * I used to have keepalive 64 on the upstream, I removed it >> * Nginx used to run as a non-privileged user, I switched it to root >> (prctl reports that privileged users should have 65,000 nofiles >> allowed) >> * I used to have worker_processes set to 5, I increased it to 16 >> * The upstream server configuration used to not have max_fails *or* >> max_timeout, I added those in trying to limit the amount of times >> nginx tried to access the downed upstream servers >> * I used to have the proxy_connect_timeout unspecified so it should >> have defaulted to 60s, I tried setting it to 1s >> * I tried commenting out all the rate-limiting directives >> >> The URLs I'm hitting in my tests are all those for the "tenantworkers" >> upstream. >> >> Any idea? I would think I probably have a resource limit issue, or an >> issue with the back-end server, but it just doesn't make sense that >> everything is OK after I comment out the downed upstreams. My concern >> is that the system will crumble under real load when even 1 upstream >> becomes unavailable. >> >> Thanks, >> Branden > > Thanks for reporting this! > > There was actually a bug in /dev/poll event method, fix included in nginx > 1.3.16. > > _______________________________________________ > nginx mailing list > [email protected] > http://mailman.nginx.org/mailman/listinfo/nginx _______________________________________________ nginx mailing list [email protected] http://mailman.nginx.org/mailman/listinfo/nginx
