The node version was 0.10.3. Since then, I have now setup 3 instances , one with node 0.10.8, another original 0.10.3 and the third one is 0.8.17, to see if there is a recurrence.
Incidentally, I am clearly noticing that the instance with 0.8.17 has lower CPU usage than the other too. All 3 are behind load balancers, so looks like 0.10.x or the modules we use have some performance regression. Regards Qasim On Tuesday, May 28, 2013 8:17:09 AM UTC+2, Qasim Zaidi wrote: > Ever since we switched to 0.10.x in production, we are seeing instances > where the server gets into a loop and CPU usage shoots to 100%. Here is > what I have been able to understand so far about this situation. > > - After some time, we run out of descriptors, i.e. the nofile limit is > reached. This triggers the 100% CPU loop, and node becomes unresponsive. > What node version is this? This was supposedly fixed in libuv-v0.10.6. To benefit from the fix you need to use node-v0.10.7 or later. Something (maybe traffic spike) triggers a socket leak in the process, and > we start with getting a lot of descriptors in close_wait state. This is > very similar to what is reported at > https://github.com/einaros/ws/issues/180 (Connections stay in CLOSE_WAIT > with Node 0.10.x). Do the CLOSE_WAIT connections show up before or after node hits 100% CPU usage? A socket in CLOSE_WAIT state means that the connection was completely wound down, but the file descriptor hasn't been closed. If a small fraction of sockets is in CLOSE_WAIT state there's nothing to worry about, node just hasn't gotten around to closing the file descriptor yet. It is also "expected" when the 100%-cpu issue kicks in since node basically hangs and won't be closing any FDs at all. - Bert On Tue, May 28, 2013 at 11:47 AM, Qasim Zaidi <[email protected]> wrote: > Hi, > > Ever since we switched to 0.10.x in production, we are seeing instances > where the server gets into a loop and CPU usage shoots to 100%. Here is > what I have been able to understand so far about this situation. > > - Something (maybe traffic spike) triggers a socket leak in the process, > and we start with getting a lot of descriptors in close_wait state. This > is very similar to what is reported at > https://github.com/einaros/ws/issues/180 (Connections stay in CLOSE_WAIT > with Node 0.10.x). > > - After some time, we run out of descriptors, i.e. the nofile limit is > reached. This triggers the 100% CPU loop, and node becomes unresponsive. > > - Here is the gdb stacktrace for this process > > #0 0x00007f2ae2b57ee9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6 > #1 0x00000000006e063b in uv__accept4 (fd=fd@entry=15, addr=addr@entry=0x0, > addrlen=addrlen@entry=0x0, flags=flags@entry=526336) > at ../deps/uv/src/unix/linux-syscalls.c:225 > #2 0x00000000006d310c in uv__accept (sockfd=sockfd@entry=15) at > ../deps/uv/src/unix/core.c:394 > #3 0x00000000006da73c in uv__emfile_trick (accept_fd=15, loop=<optimized > out>) at ../deps/uv/src/unix/stream.c:447 > #4 uv__server_io (loop=0xe4dc20 <default_loop_struct>, w=0x196d340, > events=<optimized out>) at ../deps/uv/src/unix/stream.c:521 > #5 0x00000000006dedfd in uv__io_poll (loop=loop@entry=0xe4dc20 > <default_loop_struct>, timeout=0) at ../deps/uv/src/unix/linux-core.c:211 > #6 0x00000000006d2dc8 in uv_run (loop=0xe4dc20 <default_loop_struct>, > mode=<optimized out>) at ../deps/uv/src/unix/core.c:312 > #7 0x0000000000595f10 in node::Start(int, char**) () > #8 0x00007f2ae2a8976d in __libc_start_main () from > /lib/x86_64-linux-gnu/libc.so.6 > #9 0x000000000058bba5 in _start () > > > It is continuously stuck between uv__server_io and uv__emfile_trick, > opening and closing sockets. New connections are closed immediately, as > described in the comments at the top of uv__emfile_trick, but it just won't > get out of this loop, and the process as a whole is unresponsive. I am > running with the cluster module, and accept backlog is set to 256. > > I previously suspected this to be an issue with our own code, but it > doesnt look like that any longer. errno from uv__accept is EMFILE > > fd = uv__accept(uv__stream_fd(stream)); > > This triggers the call to uv__emfile_trick, inside which the loops keeps > closing the connections until it gets an errno of EDEADLK. > > Any suggestions on how to proceed - I can go back to 0.8.x or increase the > max open file limit for the time being to make this go away, but would > really like to understand the problem first. > > Regards > Qasim > > > -- -- Job Board: http://jobs.nodejs.org/ Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines You received this message because you are subscribed to the Google Groups "nodejs" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/nodejs?hl=en?hl=en --- You received this message because you are subscribed to the Google Groups "nodejs" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
