Part of the poor performance you're seeing is because all your benchmarks
are synchronous. You're measuring the time for a *for* loop to run to
completion. This *for* loop runs on a single tick and this has various side
effects in each case.
1. *sync:* Every time you execute the code in the for loop, the dumy
message is passed to the writeSync C counterpart (which is, down the line,
write(3)). So your dummy message makes a very brief appearence in user
memory before being flushed to the system buffer and then to disk. Every
tick is slower, but the overall performance should be better because you
never hit memory limits of the v8 VM or the UV event loop.
2. *async:* This one does not measure the same thing as the first one
since you're not waiting for the write to be confirmed. This might be how
you use your logger, but it does not measure the same thing. Here, you
measure how fast you can put write queries into the uv queue, effectively
gathering a huge amount of messages into UV-managed memory. the memory used
is rougly equivalent to 101*n bytes ; not counting whatever structure UV is
putting around it.
3. *stream:* This one does measure kind of the same thing, given that
you changed it to listen to the 'finish' event which means all buffered
data has been flushed to the system queue. However, since you're adding
your messages to the queue on the very same tick, you're once again
gathering all your pending messages into memory. This time, it is V8
managed memory. This means it is limited, it is garbage collected, it
stores strings encoded as UTF8, and it has significant structure overhead
consuming even more memory. My guess is that you're hitting performance
limits of the V8 memory management, most probably garbage collection and
such.
You should change your *async* version to stop the timer when all queued
messages are effectively flushed. You should change your *stream* AND your
*async* version to not write 100,000 messages on the very same tick as this
is very unlikely to happen in common usage for a logger. You should
probably bulk those write to a reasonable maximum amount like 100 and
distribute the writes on successive ticks.
*notes : *
*This may have changed since the last time I read the node.js source code,
but it is my understanding that writes are NEVER synchrounous when using
fs.write or fs.WriteStream.*
On Tuesday, 20 May 2014 19:29:58 UTC+2, Jeff Jolma wrote:
>
> Hello.
>
> I am working on a node app and found that its performance bottleneck is
> logging. And digging deeper, it looks like the root bottleneck is on the
> WriteStream.
>
> I ran some quick benchmarks of the WriteStream performance, and it seems
> really slow. Can someone help me understand why it is taking so long and
> if I can improve the performance? Or, maybe I am doing something wrong
> with my benchmarks.
>
> I get the following output running on my MacBook:
>
>
> $ node --harmony benchmarks.js
> Running benchmarks with 100000 writes
> fs.writeSync took 0.302 seconds
> fs.write took 0.235 seconds
> writeStream.write took 4.565 seconds
>
> $ cat benchmarks.js
> 'use strict';
> const fs = require('fs');
>
> const NUM_WRITES = 100 * 1000;
> const DUMMY_MESSAGE = 'Dummy log message that is about 100
> characters.....................................................\n';
> const DUMMY_MESSAGE_BUFFER = new Buffer(DUMMY_MESSAGE);
>
> function benchmarkFileSync() {
> var fd = fs.openSync('./dummy.log', 'w');
>
> const startTime = new Date();
> for (let i = 0; i < NUM_WRITES; i++) {
> fs.writeSync(fd, DUMMY_MESSAGE);
> }
> const diff = (new Date() - startTime) / 1000;
> console.log('fs.writeSync took', diff, 'seconds');
> fs.closeSync(fd);
> }
>
> function benchmarkFileAsync() {
> var fd = fs.openSync('./dummyAsync.log', 'w');
>
> const startTime = new Date();
> for (let i = 0; i < NUM_WRITES; i++) {
> // yes, this is BAD since it isn't waiting for the callback. It
> // drops some of these writes on the floor
> fs.write(fd, DUMMY_MESSAGE_BUFFER, 0, DUMMY_MESSAGE_BUFFER.length,
> null, function cb() {
> // ignore the callback
> });
> }
> const diff = (new Date() - startTime) / 1000;
> console.log('fs.write took', diff, 'seconds');
> fs.closeSync(fd);
> }
>
> function benchmarkWriteStream() {
> const stream = fs.createWriteStream(
> './dummyWriteStream.log', {
> encoding: "utf8",
> mode: parseInt('0644', 8)
> }
> );
>
> const startTime = new Date();
> for(let i = 0; i < NUM_WRITES; i++) {
> let ret = stream.write(DUMMY_MESSAGE);
> }
>
> const interval = setInterval(function checkIfDone() {
> if (stream.bytesWritten === DUMMY_MESSAGE.length * NUM_WRITES) {
> const diff = (new Date() - startTime) / 1000;
> console.log('writeStream.write took', diff, 'seconds');
> clearInterval(interval);
> }
> }, 100);
> }
>
> console.log('Running benchmarks with', NUM_WRITES, 'writes');
> benchmarkFileSync();
> benchmarkFileAsync();
> benchmarkWriteStream();
>
> The performance becomes extremely pathological when I increase the number
> of writes beyond this. For example, with 200,000:
>
> $ node --harmony benchmarks.js
> Running benchmarks with 200000 writes
> fs.writeSync took 0.646 seconds
> fs.write took 0.479 seconds
> writeStream.write took 290.4 seconds
>
>
>
> Thanks,
> Jeff
>
>
--
Job board: http://jobs.nodejs.org/
New group rules:
https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules:
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/nodejs/bbcc74b4-a54c-459f-8451-4386e7f5b871%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.