Hi Maxim I found the rootcause. This was a problem with my plugin. Your explanation on posted_requests helped a lot in debugging the problem. The issue was, my plugin for some unavoidable reasons holds reference to the ngx_http_request_t and calls finalize once it is done or it sees some error. I didnt call ngx_run_posted_request() like ngx_http_request_handler does. The actual call to writev happens *after* the request_handler returns because of which it doesnt see the c->error or the posted request and hence doesnt clean it up.
I will fix my plugin to go with the normal nginx flow soon, but till then this fix (calling run_posted after finalize) fixes my problem. I did see the diff from 1.0.5 -> 1.2.6 and couldnt see what could have caused this. Thanks again for the help and really helpful reply. +Fasih On Fri, May 24, 2013 at 10:19 PM, Fasih <faskiri.de...@gmail.com> wrote: > Hello > > Thanks for the really quick reply. The ngx_http_run_posted_requests > totally made sense and explained the bit that I was missing. > > I get the bug when writev called in the context of a request handler gets > an error. The repro I had was basically with nginx running on a server > and client on my laptop over wireless @ work. I am not @ work now and from > my home connection I am unable to repro this. Will send you the backtrace > as soon as I get it again. > > > On Fri, May 24, 2013 at 8:24 PM, Maxim Dounin <mdou...@mdounin.ru> wrote: > >> Hello! >> >> On Fri, May 24, 2013 at 07:09:58PM +0530, Fasih wrote: >> >> > Hi all >> > >> > I have been seeing slow but steady socket leak in nginx ever since I >> > upgraded from 1.0.5 to 1.2.6. I have my custom module in nginx which I >> was >> > sure what was the leak. This is how I went about investigating: >> > 1. Configure nginx with one worker >> > 2. strace on the worker process, tracing >> > read/readv/write/writev/close/shutdown calls >> > 3. Every now and then, for all the open fds (from ls -l /proc/<pid>/fd), >> > check the socket that is not available in netstat -pane >> > 4. What I saw was, the leaking socket always had the last operation as >> > writev which returned an error. >> > 5. Increased the nginx log level to info and verified that nginx was >> > getting ECONNRESET or EPIPE on writev failure. Which was OK. >> > 6. Traced back in code to see how it is handled, the error translates to >> > CHAIN_ERROR and eventually causes ngx_http_finalize_request to be >> called. >> > This in turn calls ngx_http_terminate_request. >> > >> > However, in this function, the request is not terminated if >> > r->write_event_handler is set. This seems to be set if the request >> handler >> > is a user module. I think the rationale for the check is, if there is a >> > module who is handling the request, dont terminate yet, wait for a write >> > event on the socket and then terminate it (which is why I thought it is >> > setting r->write_event_handler to ngx_http_terminate_handler). >> >> Rationale is to make sure there are no functions on stack which >> assume request object is here and will try to access it after >> we'll free request data. >> >> The r->write_event_handler (that is, ngx_http_terminate_handler()) >> is expected to be called by a ngx_http_run_posted_requests() which >> in turn is called by low-level event handling functions (notably, >> ngx_http_request_handler()). >> >> > I tried to repro this w/ empty_gif_handler however, it sends header and >> > body in one call to writev which I cant get to fail in my test >> environment. >> > To reproduce the bug, if I replace the call to ngx_http_send_response >> with >> > ngx_http_send_header and ngx_http_output_filter (as used by >> ngx_upstream or >> > other modules which dont have the headers and body together), I could >> > reproduce the leak. I have a client that sends a request and closes the >> > socket immediately, nginx sees the error, prints the info log, and then >> it >> > doesnt close the socket. >> > >> > I have a small patch attached, the fix I did is basically saying that if >> > there is a connection error, there is no point setting >> write_event_handler >> > as there wont be any activity on the socket, so just terminate it >> > immediately. >> > >> > I could be very wrong in the understanding of the code flow. My patch >> just >> > fixes this and I am not very sure if this is the right fix. Please let >> me >> > know. >> > >> > I will try to add a testcase to reproduce this in the nginx test >> framework. >> >> The patch looks wrong, see above. >> >> Could you please show a backtrace up to >> ngx_http_terminate_request() with mr->write_event_handler and >> c->error set (i.e. where you think leak happens)? >> >> You may also want to upgrade to a more recent version, e.g. 1.5.0, >> to make sure the problem you are facing isn't already fixed. >> >> -- >> Maxim Dounin >> http://nginx.org/en/donation.html >> >> _______________________________________________ >> nginx-devel mailing list >> nginx-devel@nginx.org >> http://mailman.nginx.org/mailman/listinfo/nginx-devel >> > >
_______________________________________________ nginx-devel mailing list nginx-devel@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx-devel