We are trying to debug a poorly performing node application and would 
appreciate any help or advice from this community. We have a node 
application that serves as the user facing frontend for a payment platform 
- code here https://github.com/alphagov/pay-frontend. We are in the process 
of assessing and expanding our capacity to meet increasing need. 
We have a target of being able to serve X payment journeys per second. 
A payment journey comprises 4 pages, two of which require a form submission.
Each page in the journey entails some communication between the node 
application in question (that we helpfully call frontend) and other 
microservices to establish the current status of the payment etc, on 
average around 2 http calls per page.
By carrying out performance tests (using Gatling) we have found that in 
order to meet our target of X tx/s, we have to provision around X/2 
frontend nodes, i.e. each frontend node appears capable of processing 
around 2 payment journeys per second on average.
This seems wrong - by my reckoning it is wrong by orders of magnitude.

*Details about our tech stack*
We are on aws, and the frontends run in docker containers on C5.large ec2 
instances.
We use https internally
We are running node 8 in production
The application is an express app
We use http.request to make downstream requests, but have also experimented 
with using request, with no appreciable difference.
There is no major cpu heavy processes in our frontend app, and event loop 
latency under normal load is fine

*What we have found so far*
The frontend nodes are CPU bound
Under strain/near breaking point, profiling reveals the frontends seem to 
be spending a large amount of time doing things related to making 
downstream http requests, but nothing obviously ludicrous. 
Whilst there is no obvious memory leak, the heap dump deltas show a 
proportionately large number of Sockets hanging around - I think this is 
just due to keepalives though
Even not under heavy load, the network latency for a request seems high for 
an internal request - we are seeing average latency of ~20-40ms, vs around 
2-5ms for a Java app that is more or less identical in the calls it's 
making.
Break down of the phases of a request (gained from request library's timing 
facility) reveals that under low load on average socket wait, dns lookup 
and tcp connection take practically no time - bulk of time is waiting for 
server response
Under load it appears to be the time to establish a tcp connection and the 
time to 'firstByte' that contribute to overall increase in http request time

*Things we have tried*
We have tried configuring the standard agent with different values of 
maxSockets, maxFreeSockets...
We have tried using different agents 
We have tried disabling socket pooling entirely
We have tried two different client libs - the core http module, and request.
We have matched the number of workers in our cluster to the number of CPUs

Some of these things have yielded gains of ~10%, but I am still convinced 
there is something fundamentally wrong with the architecture and 
configuration of the application - the throughput just seems too low.

I realise I haven't given enough detail to solve anything here, but if 
anyone has any guidance on approaches that have worked for them, other 
knobs to twiddle, guidance on better interpretation of profiling and heap 
dumps, or any other useful pointers I would be very grateful.

Dom

-- 
Job board: http://jobs.nodejs.org/
New group rules: 
https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/nodejs/df817fd9-ae4c-41bd-8f35-b61a7ae842f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to