Re: [nodejs] nodejs zlib performance

Jorge Thu, 25 Oct 2012 14:38:17 -0700

Hi Isaac,

He could pipe uncompressed output from node process A onto compressor node 
process B. That's the Node Way®, isn't it?


Or, he could do it all in a single node process, but that would mean delegating 
CPU intensive jobs to background threads, the mere idea of which is something 
that unnerves more than one here.

Or not?

Cheers,
-- 
Jorge.

On 25/10/2012, at 23:24, Isaac Schlueter wrote:

> Jorge,
> 
> Please do not make snarky remarks about Node on this mailing list.  If
> you have a problem with something, bring it up in a new thread.  If
> you have something to add to this thread, then please do so, but this
> is not helpful.
> 
> 
> On Thu, Oct 25, 2012 at 10:15 PM, Jorge <[email protected]> wrote:
>> Threads are evil™, don't use threads.
>> 
>> The Node Way® (just don't ask) is to pipeline processes as in the good ol' 
>> 70s. Flower Power, peace and love bro, and etc.
>> 
>> Cheers,
>> --
>> Jorge.
>> 
>> On 25/10/2012, at 19:24, Vadim Antonov wrote:
>> 
>>> There're 8 processes (1 per core, created with cluster lib) and every 
>>> process has got 6 threads.
>>> Do you have an ETA for the thread pool updates? Is there a way how can we 
>>> help you?
>>> 
>>> Thank you.
>>> --
>>> Vadim
>>> 
>>> On Wednesday, October 24, 2012 5:41:36 PM UTC-7, Ben Noordhuis wrote:
>>> On Thu, Oct 25, 2012 at 12:26 AM, Vadim Antonov <[email protected]> wrote:
>>>> Hi everybody,
>>>> 
>>>> I've tried to google about the nodejs zlib performance and didn't find any
>>>> useful information.
>>>> I work on the high-loaded API which communicates with multiple backend
>>>> servers. Throughput on 1 VM with 8 cores is around 200 QPS and for every
>>>> query application makes up to 50 queries to cache/backends. Every response
>>>> from cache/backends is compressed and requires decompression.
>>>> It ends up that application need to make up too 10000 decompressions per
>>>> second.
>>>> 
>>>> Based on the zlib code for every decompression new thread from the thread
>>>> pool is being used (one binding to the C++ code per decompression, we use
>>>> naive method http://nodejs.org/api/zlib.html#zlib_zlib_gunzip_buf_callback 
>>>> -
>>>> there is no way to use the stream methods):
>>>> https://github.com/joyent/node/blob/master/lib/zlib.js#L272
>>>> https://github.com/joyent/node/blob/master/src/node_zlib.cc#L430
>>>> 
>>>> Service has started to see huge performance spikes a couple of times during
>>>> the day which are coming from the decompression code: from time to time
>>>> decompression takes up to 5 seconds and  all decompression calls are 
>>>> blocked
>>>> during this time.
>>>> I think that the issue is coming from the thread pool (uv_work_t) which 
>>>> zlib
>>>> is using. Does anybody else see the same behavior? Is there any workarounds
>>>> for it? Where can I find documentation about it? V8 code?
>>>> At this point of time we've started to use snappy library
>>>> (https://github.com/kesla/node-snappy) with sync compression/decompression
>>>> calls. But service still need to decompress backend responses with gzip...
>>>> 
>>>> To illustrate a little bit what I'm talking about, here is a small example
>>>> (it generate 'count' buffers, decompresses them 'count2' times and writes
>>>> all + min/max/avg timings).
>>>> 
>>>> var _ = require('underscore');
>>>> var rbytes = require('rbytes');
>>>> var step = require('step');
>>>> var zlib = require('zlib');
>>>> 
>>>> var count = 10;
>>>> var count2 = 1000;
>>>> var count3 = 0;
>>>> var len = 1024;
>>>> var buffers = [];
>>>> var timings = {};
>>>> var totalTime = 0;
>>>> var concurrent = 0;
>>>> var maxConcurrent = 128;
>>>> 
>>>> function addCompressed(done) {
>>>>    zlib.gzip(rbytes.randomBytes(len), function (error, compressed) {
>>>>        buffers.push(compressed);
>>>>        done();
>>>>    });
>>>> }
>>>> 
>>>> function decompress(done) {
>>>>    var time = Date.now();
>>>>    zlib.gunzip(buffers[Math.floor(Math.random() * count)], function (error,
>>>> decompresed) {
>>>>        if (error) {
>>>>            console.log(error);
>>>>        }
>>>>        var total = Date.now() - time;
>>>>        totalTime += total;
>>>>        if (!timings[total]) {
>>>>            timings[total] = 0;
>>>>        }
>>>>        timings[total]++;
>>>>        ++count3;
>>>>        if (done && count3 == count2) {
>>>>            done();
>>>>        }
>>>>    });
>>>> }
>>>> 
>>>> step(
>>>>    function genBuffers() {
>>>>        for(var i = 0; i < count; ++i) {
>>>>            var next = this.parallel();
>>>>            addCompressed(next);
>>>>        }
>>>>    },
>>>>    function runDecompression() {
>>>>        var next = this;
>>>>        for(var i = 0; i < count2; ++i) {
>>>>            decompress(next);
>>>>        }
>>>>    },
>>>>    function writeTotal() {
>>>>        var min = null;
>>>>        var max = -1;
>>>>        _.each(timings, function(total, value) {
>>>>            max = Math.max(value, max);
>>>>            min = min ? Math.min(min, value) : value;
>>>>            console.log(value + ' ' + total);
>>>>        });
>>>>        console.log('min ' + min);
>>>>        console.log('max ' + max);
>>>>        console.log('avg ' + totalTime / count2);
>>>>    }
>>>> );
>>>> 
>>>> Here'are results for different amount decompressions (amount of
>>>> compressions, min/max/avg timings):
>>>> 10        0       1      0.1
>>>> 100       1       6      3.8
>>>> 1000     19      47     30.7
>>>> 10000   149     382    255.0
>>>> 100000 4120   18607  16094.3
>>>> 
>>>> Decompression time grows based on the amount of concurrent decompressions.
>>>> Is there a way to make it faster/limit amount of threads which zlib is
>>>> using?
>>> 
>>> How many active threads do you see in e.g. htop? There should
>>> preferably be as many threads as there are cores in your machine (give
>>> or take one).
>>> 
>>> Aside, the current thread pool implementation is a known bottleneck in
>>> node right now. We're working on addressing that in master but it's
>>> not done yet.
>>> 
>>> --
>>> Job Board: http://jobs.nodejs.org/
>>> Posting guidelines: 
>>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>>> You received this message because you are subscribed to the Google
>>> Groups "nodejs" group.
>>> To post to this group, send email to [email protected]
>>> To unsubscribe from this group, send email to
>>> [email protected]
>>> For more options, visit this group at
>>> http://groups.google.com/group/nodejs?hl=en?hl=en
>> 
>> --
>> Job Board: http://jobs.nodejs.org/
>> Posting guidelines: 
>> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
>> You received this message because you are subscribed to the Google
>> Groups "nodejs" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/nodejs?hl=en?hl=en
> 
> -- 
> Job Board: http://jobs.nodejs.org/
> Posting guidelines: 
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Re: [nodejs] nodejs zlib performance

Reply via email to