Also - on the closed_wait - the number of closed wait sockets is same as
the nofile limit when this happens.

The process keeps on closing the socket after accepting them. That is what
the comment says, and that's exactly the behaviour I have observed in gdb.
I don't know what comes first, the close_wait increase or 100% CPU usage
yet.


On Wed, May 29, 2013 at 10:51 PM, Qasim Zaidi <[email protected]> wrote:

> The node version was 0.10.3. Since then, I have now setup 3 instances ,
> one with node 0.10.8, another original 0.10.3 and the third one is 0.8.17,
> to see if there is a recurrence.
>
> Incidentally, I am clearly noticing that the instance with 0.8.17 has
> lower CPU usage than the other too. All 3 are behind load balancers, so
> looks like 0.10.x or the modules we  use have some performance regression.
>
> Regards
> Qasim
>
> On Tuesday, May 28, 2013 8:17:09 AM UTC+2, Qasim Zaidi wrote:
>
>>  Ever since we switched to 0.10.x in production, we are seeing instances
>> where the server gets into a loop and CPU usage shoots to 100%. Here is
>> what I have been able to understand so far about this situation.
>>
>> - After some time, we run out of descriptors, i.e. the nofile limit is
>> reached. This triggers the 100% CPU loop, and node becomes unresponsive.
>>
>
> What node version is this? This was supposedly fixed in libuv-v0.10.6. To
> benefit from the fix you need to use node-v0.10.7 or later.
>
> Something (maybe traffic spike) triggers a socket leak in the process, and
>> we start with getting a lot of descriptors in close_wait state.  This is
>> very similar to what is reported at
>> https://github.com/einaros/ws/issues/180 (Connections stay in CLOSE_WAIT
>> with Node 0.10.x).
>
>
> Do the CLOSE_WAIT connections show up before or after node hits 100% CPU
> usage? A socket in CLOSE_WAIT state means that the connection was
> completely wound down, but the file descriptor hasn't been closed. If a
> small fraction of sockets is in CLOSE_WAIT state there's nothing to worry
> about, node just hasn't gotten around to closing the file descriptor yet.
> It is also "expected" when the 100%-cpu issue kicks in since node basically
> hangs and won't be closing any FDs at all.
>
> - Bert
>
>
> On Tue, May 28, 2013 at 11:47 AM, Qasim Zaidi <[email protected]> wrote:
>
>> Hi,
>>
>> Ever since we switched to 0.10.x in production, we are seeing instances
>> where the server gets into a loop and CPU usage shoots to 100%. Here is
>> what I have been able to understand so far about this situation.
>>
>> - Something (maybe traffic spike) triggers a socket leak in the process,
>> and we start with getting a lot of descriptors in close_wait state.  This
>> is very similar to what is reported at
>> https://github.com/einaros/ws/issues/180 (Connections stay in CLOSE_WAIT
>> with Node 0.10.x).
>>
>> - After some time, we run out of descriptors, i.e. the nofile limit is
>> reached. This triggers the 100% CPU loop, and node becomes unresponsive.
>>
>> - Here is the gdb stacktrace for this process
>>
>> #0  0x00007f2ae2b57ee9 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
>> #1  0x00000000006e063b in uv__accept4 (fd=fd@entry=15, addr=addr@entry=0x0,
>> addrlen=addrlen@entry=0x0, flags=flags@entry=526336)
>>     at ../deps/uv/src/unix/linux-syscalls.c:225
>> #2  0x00000000006d310c in uv__accept (sockfd=sockfd@entry=15) at
>> ../deps/uv/src/unix/core.c:394
>> #3  0x00000000006da73c in uv__emfile_trick (accept_fd=15, loop=<optimized
>> out>) at ../deps/uv/src/unix/stream.c:447
>> #4  uv__server_io (loop=0xe4dc20 <default_loop_struct>, w=0x196d340,
>> events=<optimized out>) at ../deps/uv/src/unix/stream.c:521
>> #5  0x00000000006dedfd in uv__io_poll (loop=loop@entry=0xe4dc20
>> <default_loop_struct>, timeout=0) at ../deps/uv/src/unix/linux-core.c:211
>> #6  0x00000000006d2dc8 in uv_run (loop=0xe4dc20 <default_loop_struct>,
>> mode=<optimized out>) at ../deps/uv/src/unix/core.c:312
>> #7  0x0000000000595f10 in node::Start(int, char**) ()
>> #8  0x00007f2ae2a8976d in __libc_start_main () from
>> /lib/x86_64-linux-gnu/libc.so.6
>> #9  0x000000000058bba5 in _start ()
>>
>>
>> It is continuously stuck between uv__server_io and uv__emfile_trick,
>> opening and closing sockets. New connections are closed immediately, as
>> described in the comments at the top of uv__emfile_trick, but it just won't
>> get out of this loop, and the process as a whole is unresponsive. I am
>> running with the cluster module, and accept backlog is set to 256.
>>
>> I previously suspected this to be an issue with our own code, but it
>> doesnt look like that any longer. errno from uv__accept is EMFILE
>>
>>     fd = uv__accept(uv__stream_fd(stream));
>>
>> This triggers the call to uv__emfile_trick, inside which the loops keeps
>> closing the connections until it gets an errno of EDEADLK.
>>
>> Any suggestions on how to proceed - I can go back to 0.8.x or increase
>> the max open file limit for the time being to make this go away, but would
>> really like to understand the problem first.
>>
>> Regards
>> Qasim
>>
>>
>>
>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to