Part of the poor performance you're seeing is because all your benchmarks 
are synchronous. You're measuring the time for a *for* loop to run to 
completion. This *for* loop runs on a single tick and this has various side 
effects in each case.


   1. *sync:* Every time you execute the code in the for loop, the dumy 
   message is passed to the writeSync C counterpart (which is, down the line, 
   write(3)). So your dummy message makes a very brief appearence in user 
   memory before being flushed to the system buffer and then to disk. Every 
   tick is slower, but the overall performance should be better because you 
   never hit memory limits of the v8 VM or the UV event loop.
   2. *async:* This one does not measure the same thing as the first one 
   since you're not waiting for the write to be confirmed. This might be how 
   you use your logger, but it does not measure the same thing. Here, you 
   measure how fast you can put write queries into the uv queue, effectively 
   gathering a huge amount of messages into UV-managed memory. the memory used 
   is rougly equivalent to 101*n bytes ; not counting whatever structure UV is 
   putting around it.
   3. *stream:* This one does measure kind of the same thing, given that 
   you changed it to listen to the 'finish' event which means all buffered 
   data has been flushed to the system queue. However, since you're adding 
   your messages to the queue on the very same tick, you're once again 
   gathering all your pending messages into memory. This time, it is V8 
   managed memory. This means it is limited, it is garbage collected, it 
   stores strings encoded as UTF8, and it has significant structure overhead 
   consuming even more memory. My guess is that you're hitting performance 
   limits of the V8 memory management, most probably garbage collection and 
   such.

You should change your *async* version to stop the timer when all queued 
messages are effectively flushed. You should change your *stream* AND your 
*async* version to not write 100,000 messages on the very same tick as this 
is very unlikely to happen in common usage for a logger. You should 
probably bulk those write to a reasonable maximum amount like 100 and 
distribute the writes on successive ticks.


*notes : *
*This may have changed since the last time I read the node.js source code, 
but it is my understanding that writes are NEVER synchrounous when using 
fs.write or fs.WriteStream.*

On Tuesday, 20 May 2014 19:29:58 UTC+2, Jeff Jolma wrote:
>
> Hello.
>
> I am working on a node app and found that its performance bottleneck is 
> logging. And digging deeper, it looks like the root bottleneck is on the 
> WriteStream.
>
> I ran some quick benchmarks of the WriteStream performance, and it seems 
> really slow.  Can someone help me understand why it is taking so long and 
> if I can improve the performance?  Or, maybe I am doing something wrong 
> with my benchmarks.
>
> I get the following output running on my MacBook:
>
>
> $ node --harmony benchmarks.js 
> Running benchmarks with 100000 writes
> fs.writeSync took 0.302 seconds
> fs.write took 0.235 seconds
> writeStream.write took 4.565 seconds
>
> $ cat benchmarks.js
> 'use strict';
> const fs = require('fs');
>
> const NUM_WRITES = 100 * 1000;
> const DUMMY_MESSAGE = 'Dummy log message that is about 100 
> characters.....................................................\n';
> const DUMMY_MESSAGE_BUFFER = new Buffer(DUMMY_MESSAGE);
>
> function benchmarkFileSync() {
>   var fd = fs.openSync('./dummy.log', 'w');
>
>   const startTime = new Date();
>   for (let i = 0; i < NUM_WRITES; i++) {
>     fs.writeSync(fd, DUMMY_MESSAGE);
>   }
>   const diff = (new Date() - startTime) / 1000;
>   console.log('fs.writeSync took', diff, 'seconds');
>   fs.closeSync(fd);
> }
>
> function benchmarkFileAsync() {
>   var fd = fs.openSync('./dummyAsync.log', 'w');
>
>   const startTime = new Date();
>   for (let i = 0; i < NUM_WRITES; i++) {
>     // yes, this is BAD since it isn't waiting for the callback. It
>     // drops some of these writes on the floor
>     fs.write(fd, DUMMY_MESSAGE_BUFFER, 0, DUMMY_MESSAGE_BUFFER.length, 
> null, function cb() {
>       // ignore the callback
>     });
>   }
>   const diff = (new Date() - startTime) / 1000;
>   console.log('fs.write took', diff, 'seconds');
>   fs.closeSync(fd);
> }
>
> function benchmarkWriteStream() {
>   const stream = fs.createWriteStream(
>     './dummyWriteStream.log', {
>       encoding: "utf8",
>       mode: parseInt('0644', 8)
>     }
>   );
>
>   const startTime = new Date();
>   for(let i = 0; i < NUM_WRITES; i++) {
>     let ret = stream.write(DUMMY_MESSAGE);
>   }
>
>   const interval = setInterval(function checkIfDone() {
>     if (stream.bytesWritten === DUMMY_MESSAGE.length * NUM_WRITES) {
>       const diff = (new Date() - startTime) / 1000;
>       console.log('writeStream.write took', diff, 'seconds');
>       clearInterval(interval);
>     }
>   }, 100);
> }
>
> console.log('Running benchmarks with', NUM_WRITES, 'writes');
> benchmarkFileSync();
> benchmarkFileAsync();
> benchmarkWriteStream();
>
> The performance becomes extremely pathological when I increase the number 
> of writes beyond this. For example, with 200,000:
>
> $ node --harmony benchmarks.js 
> Running benchmarks with 200000 writes
> fs.writeSync took 0.646 seconds
> fs.write took 0.479 seconds
> writeStream.write took 290.4 seconds
>
>
>
> Thanks,
> Jeff
>
>

-- 
Job board: http://jobs.nodejs.org/
New group rules: 
https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/nodejs/bbcc74b4-a54c-459f-8451-4386e7f5b871%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to