[nodejs] Re: 100% CPU usage with node, EMFILE, CLOSE_WAIT

Qasim Zaidi Wed, 29 May 2013 10:22:31 -0700

The node version was 0.10.3. Since then, I have now setup 3 instances , one
with node 0.10.8, another original 0.10.3 and the third one is 0.8.17, to
see if there is a recurrence.


Incidentally, I am clearly noticing that the instance with 0.8.17 has lower
CPU usage than the other too. All 3 are behind load balancers, so looks
like 0.10.x or the modules we  use have some performance regression.

Regards
Qasim

On Tuesday, May 28, 2013 8:17:09 AM UTC+2, Qasim Zaidi wrote:

> Ever since we switched to 0.10.x in production, we are seeing instances
> where the server gets into a loop and CPU usage shoots to 100%. Here is
> what I have been able to understand so far about this situation.
>
> - After some time, we run out of descriptors, i.e. the nofile limit is
> reached. This triggers the 100% CPU loop, and node becomes unresponsive.
>

What node version is this? This was supposedly fixed in libuv-v0.10.6. To
benefit from the fix you need to use node-v0.10.7 or later.

Something (maybe traffic spike) triggers a socket leak in the process, and
> we start with getting a lot of descriptors in close_wait state.  This is
> very similar to what is reported at
> https://github.com/einaros/ws/issues/180 (Connections stay in CLOSE_WAIT
> with Node 0.10.x).


Do the CLOSE_WAIT connections show up before or after node hits 100% CPU
usage? A socket in CLOSE_WAIT state means that the connection was
completely wound down, but the file descriptor hasn't been closed. If a
small fraction of sockets is in CLOSE_WAIT state there's nothing to worry
about, node just hasn't gotten around to closing the file descriptor yet.
It is also "expected" when the 100%-cpu issue kicks in since node basically
hangs and won't be closing any FDs at all.

- Bert


On Tue, May 28, 2013 at 11:47 AM, Qasim Zaidi <[email protected]> wrote:

> Hi,
>
> Ever since we switched to 0.10.x in production, we are seeing instances
> where the server gets into a loop and CPU usage shoots to 100%. Here is
> what I have been able to understand so far about this situation.
>
> - Something (maybe traffic spike) triggers a socket leak in the process,
> and we start with getting a lot of descriptors in close_wait state.  This
> is very similar to what is reported at
> https://github.com/einaros/ws/issues/180 (Connections stay in CLOSE_WAIT
> with Node 0.10.x).
>
> - After some time, we run out of descriptors, i.e. the nofile limit is
> reached. This triggers the 100% CPU loop, and node becomes unresponsive.
>
> - Here is the gdb stacktrace for this process
>
> #0  0x00007f2ae2b57ee9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00000000006e063b in uv__accept4 (fd=fd@entry=15, addr=addr@entry=0x0,
> addrlen=addrlen@entry=0x0, flags=flags@entry=526336)
>     at ../deps/uv/src/unix/linux-syscalls.c:225
> #2  0x00000000006d310c in uv__accept (sockfd=sockfd@entry=15) at
> ../deps/uv/src/unix/core.c:394
> #3  0x00000000006da73c in uv__emfile_trick (accept_fd=15, loop=<optimized
> out>) at ../deps/uv/src/unix/stream.c:447
> #4  uv__server_io (loop=0xe4dc20 <default_loop_struct>, w=0x196d340,
> events=<optimized out>) at ../deps/uv/src/unix/stream.c:521
> #5  0x00000000006dedfd in uv__io_poll (loop=loop@entry=0xe4dc20
> <default_loop_struct>, timeout=0) at ../deps/uv/src/unix/linux-core.c:211
> #6  0x00000000006d2dc8 in uv_run (loop=0xe4dc20 <default_loop_struct>,
> mode=<optimized out>) at ../deps/uv/src/unix/core.c:312
> #7  0x0000000000595f10 in node::Start(int, char**) ()
> #8  0x00007f2ae2a8976d in __libc_start_main () from
> /lib/x86_64-linux-gnu/libc.so.6
> #9  0x000000000058bba5 in _start ()
>
>
> It is continuously stuck between uv__server_io and uv__emfile_trick,
> opening and closing sockets. New connections are closed immediately, as
> described in the comments at the top of uv__emfile_trick, but it just won't
> get out of this loop, and the process as a whole is unresponsive. I am
> running with the cluster module, and accept backlog is set to 256.
>
> I previously suspected this to be an issue with our own code, but it
> doesnt look like that any longer. errno from uv__accept is EMFILE
>
>     fd = uv__accept(uv__stream_fd(stream));
>
> This triggers the call to uv__emfile_trick, inside which the loops keeps
> closing the connections until it gets an errno of EDEADLK.
>
> Any suggestions on how to proceed - I can go back to 0.8.x or increase the
> max open file limit for the time being to make this go away, but would
> really like to understand the problem first.
>
> Regards
> Qasim
>
>
>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

[nodejs] Re: 100% CPU usage with node, EMFILE, CLOSE_WAIT

Reply via email to