Hi Ben,

we have to applications: one with two cores and ones with four. The
four-core application has more traffic and we had switch it to separate
processes balanced by nginx. This works fine and the load is down.

On the other server I have installed strace. It's running in cluster-mode
to get some data. But the load is low and the node-cluster isn't going
doing hard work. So I'm waiting for an occasion to gather high load data.
Then I will post it.

Thank you for your help!


best regards
jens himmelreich

Am Freitag, 16. August 2013 19:51:19 UTC+2 schrieb Ben Noordhuis:
>
> On Fri, Aug 16, 2013 at 7:00 PM, Jens Himmelreich 
> <[email protected] <javascript:>> wrote: 
> > 
> > Am Dienstag, 13. August 2013 22:35:42 UTC+2 schrieb Ben Noordhuis: 
> >> 
> >> On Tue, Aug 13, 2013 at 5:04 PM, Jens Himmelreich 
> >> <[email protected]> wrote: 
> >> > 
> >> > Am Montag, 12. August 2013 19:03:27 UTC+2 schrieb Ben Noordhuis: 
> >> >> 
> >> >> On Mon, Aug 12, 2013 at 6:05 PM, Jens Himmelreich 
> >> >> <[email protected]> wrote: 
> >> >> > Hi, 
> >> >> > 
> >> >> > I have a node-server between an nginx-webserver (static content) 
> and 
> >> >> > an 
> >> >> > varnish. 
> >> >> > Behind the varnish are two application-server for search and 
> >> >> > cart-hosting. 
> >> >> > The 
> >> >> > node-machine has 4 cores. Every machine has load 0.1 except the 
> >> >> > node-machine. 
> >> >> > It has a load of 4.0. The node-application is an 
> express-application 
> >> >> > running 
> >> >> > in 
> >> >> > a node-cluster with 4 workers. http.globalAgent.maxSockets is set 
> to 
> >> >> > 400. 
> >> >> > Does anyone know why the load is high? 
> >> >> 
> >> >> Because it's doing a lot?  Your post is too light on details to 
> really 
> >> >> make informed suggestions.  Try profiling your application to see 
> >> >> where it's spending its time. 
> >> > 
> >> > 
> >> > I see. I had to do a profiling. It's a little bit complecated, 
> because 
> >> > we 
> >> > use 
> >> > coffeescript. The --nodejs --prof Parameters won't went to the 
> workers. 
> >> 
> >> --prof works with the cluster module, that's something I fixed a while 
> >> ago. 
> >> 
> >> If you're running v0.10 or v0.11, node.js writes out <n> v8-%p.log 
> >> files, where <n> is the number of workers and %p the worker's PID. 
> >> 
> >> > The global coffee is used, which I could instrument. I tried it 
> without 
> >> > the cluster - to get a quick result for --prof - and I see that the 
> load 
> >> > is going to 0.02. So it seems to be a problem of the cluster. 
> >> > 
> >> > (I know, I will do the profiling.) Are there experiences with a 
> cluster- 
> >> > load which rise to 1 per cpu? 
> >> 
> >> Usually when people see spikes like that with the cluster module, it's 
> >> because they're sending huge messages between processes. 
> >> 
> >> `process.send(msg)` serializes `msg` to JSON, then deserializes it 
> >> again on the other end.  If msg is a big array or object, that 
> >> consumes lots of memory and CPU time. 
> >> 
> >> > best regards 
> >> > jens 
> > 
> > 
> > Hi, 
> > 
> > I have done my profiling-homework. (Your hint about sending big messages 
> > is very interesting!) 
> > 
> > The master (5% CPU) is doing the following: 
> > 
> >  [Shared libraries]: 
> >    ticks  total  nonlib   name 
> >   2408880   99.9%    0.0%  /lib/libc-2.11.3.so 
> >    2029    0.1%    0.0%  /usr/local/bin/node-p 
> >      39    0.0%    0.0%  /lib/libpthread-2.11.3.so 
> >       5    0.0%    0.0%  ffffffffff600000-ffffffffff601000 
> >       2    0.0%    0.0%  /usr/lib/libstdc++.so.6.0.13 
> >       1    0.0%    0.0%  7fff11fb6000-7fff11fb7000 
> > 
> > [Bottom up (heavy) profile]: 
> >   Note: percentage shows a share of a particular caller in the total 
> >   amount of its parent calls. 
> >   Callers occupying less than 2.0% are not shown. 
> > 
> >    ticks parent  name 
> >   2408880   99.9%  /lib/libc-2.11.3.so 
> > 
> > 
> > And the every worker (100% CPU) is doing the following: 
> > 
> >  [Shared libraries]: 
> >    ticks  total  nonlib   name 
> >   1462783   60.7%    0.0%  /lib/libc-2.11.3.so 
> >   764186   31.7%    0.0%  ffffffffff600000-ffffffffff601000 
> >   108276    4.5%    0.0%  /usr/local/bin/node-p 
> >   38058    1.6%    0.0%  7fff8c57d000-7fff8c57e000 
> >   30770    1.3%    0.0%  /lib/librt-2.11.3.so 
> >     366    0.0%    0.0%  /lib/libpthread-2.11.3.so 
> >      21    0.0%    0.0%  /usr/lib/libstdc++.so.6.0.13 
> >       2    0.0%    0.0%  /lib/libm-2.11.3.so 
> > 
> > [JavaScript]: 
> >    ticks  total  nonlib   name 
> >     295    0.0%    6.8%  LazyCompile: *exports.flatten.flatten 
> > 
> /srv/fricke/nodeshared/node_modules/coffee-script/lib/coffee-script/helpers.js:70
>  
>
> >     187    0.0%    4.3%  KeyedLoadIC: A keyed load IC from the snapshot 
> >     162    0.0%    3.7%  Stub: CEntryStub 
> >     150    0.0%    3.4%  LazyCompile: *exports.Base.Base.eachChild 
> > 
> /srv/fricke/nodeshared/node_modules/coffee-script/lib/coffee-script/nodes.js:179
>  
>
> >     111    0.0%    2.5%  LazyCompile: walk 
> > 
> /srv/fricke/nodeshared/node_modules/connect-assets/node_modules/snockets/node_modules/uglify-js/lib/process.js:207
>  
>
> >      83    0.0%    1.9%  LazyCompile: ToString native runtime.js:550 
> >      82    0.0%    1.9%  LazyCompile: parse 
> > 
> /srv/fricke/nodeshared/node_modules/coffee-script/lib/coffee-script/parser.js:485
>  
>
> >      71    0.0%    1.6%  Stub: KeyedLoadElementStub 
> >      68    0.0%    1.6%  Builtin: A builtin from the snapshot {4} 
> >      63    0.0%    1.4%  LazyCompile: 
> > *exports.Base.Base.updateLocationDataIfMissing 
> > 
> /srv/fricke/nodeshared/node_modules/coffee-script/lib/coffee-script/nodes.js:242
>  
>
> >      61    0.0%    1.4%  LazyCompile: *DoConstructRegExp native 
> regexp.js:35 
> >      57    0.0%    1.3%  Stub: InstanceofStub_INLINE 
> > [...] 
> > 
> > [Bottom up (heavy) profile]: 
> >   Note: percentage shows a share of a particular caller in the total 
> >   amount of its parent calls. 
> >   Callers occupying less than 2.0% are not shown. 
> > 
> >    ticks parent  name 
> >   1462783   60.7%  /lib/libc-2.11.3.so 
> > 
> >   764186   31.7%  ffffffffff600000-ffffffffff601000 
> > 
> >   108276    4.5%  /usr/local/bin/node-p 
> > 
> > 
> > Do you see any problematic pattern? Should I post the whole profile? 
> > If I don't use the cluster the node runs with Load 0.02 
> > At the moment we try to use nginx as balance and separate node-processes 
> > without the cluster-module. 
> > 
> > 
> > best regards 
> > jens himmelreich 
>
> Interesting.  The fact that the workers spends nearly all their time 
> in libc and the vDSO (the kernel gateway, kind of) suggests that 
> they're doing something that is computationally expensive and involves 
> a lot of I/O.  Message passing is a likely culprit. 
>
> What do you see when you trace a worker with `strace -p <pid>` or 
> `perf record -p <pid>`?  (strace traces system calls, perf is a 'trace 
> everything' tool.  Note that running perf that way probably requires 
> root privileges. ) 
>
> `perf report` or `perf report --stdio` shows a breakdown of the data 
> recorded by `perf record`.  It would not surprise me if there are many 
> string and/or memory handling functions in the top 10. 
>
> Tip: if the summary is dominated by kernel code, try `perf record -e 
> cycles:u -p <pid>` to trace only user-mode code. 
>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to