Hi,

Ever since we switched to 0.10.x in production, we are seeing instances
where the server gets into a loop and CPU usage shoots to 100%. Here is
what I have been able to understand so far about this situation.

- Something (maybe traffic spike) triggers a socket leak in the process,
and we start with getting a lot of descriptors in close_wait state.  This
is very similar to what is reported at
https://github.com/einaros/ws/issues/180 (Connections stay in CLOSE_WAIT
with Node 0.10.x).

- After some time, we run out of descriptors, i.e. the nofile limit is
reached. This triggers the 100% CPU loop, and node becomes unresponsive.

- Here is the gdb stacktrace for this process

#0  0x00007f2ae2b57ee9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00000000006e063b in uv__accept4 (fd=fd@entry=15, addr=addr@entry=0x0,
addrlen=addrlen@entry=0x0, flags=flags@entry=526336)
    at ../deps/uv/src/unix/linux-syscalls.c:225
#2  0x00000000006d310c in uv__accept (sockfd=sockfd@entry=15) at
../deps/uv/src/unix/core.c:394
#3  0x00000000006da73c in uv__emfile_trick (accept_fd=15, loop=<optimized
out>) at ../deps/uv/src/unix/stream.c:447
#4  uv__server_io (loop=0xe4dc20 <default_loop_struct>, w=0x196d340,
events=<optimized out>) at ../deps/uv/src/unix/stream.c:521
#5  0x00000000006dedfd in uv__io_poll (loop=loop@entry=0xe4dc20
<default_loop_struct>, timeout=0) at ../deps/uv/src/unix/linux-core.c:211
#6  0x00000000006d2dc8 in uv_run (loop=0xe4dc20 <default_loop_struct>,
mode=<optimized out>) at ../deps/uv/src/unix/core.c:312
#7  0x0000000000595f10 in node::Start(int, char**) ()
#8  0x00007f2ae2a8976d in __libc_start_main () from
/lib/x86_64-linux-gnu/libc.so.6
#9  0x000000000058bba5 in _start ()


It is continuously stuck between uv__server_io and uv__emfile_trick,
opening and closing sockets. New connections are closed immediately, as
described in the comments at the top of uv__emfile_trick, but it just won't
get out of this loop, and the process as a whole is unresponsive. I am
running with the cluster module, and accept backlog is set to 256.

I previously suspected this to be an issue with our own code, but it doesnt
look like that any longer. errno from uv__accept is EMFILE

    fd = uv__accept(uv__stream_fd(stream));

This triggers the call to uv__emfile_trick, inside which the loops keeps
closing the connections until it gets an errno of EDEADLK.

Any suggestions on how to proceed - I can go back to 0.8.x or increase the
max open file limit for the time being to make this go away, but would
really like to understand the problem first.

Regards
Qasim

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to