Aha!  Thank you, Ben.  Your point #2 is especially informative.

Matt's code and mine both show that creating oodles of Buffers (whether
explicitly or implicitly) is not a serious bottleneck compared to the queue
handling.   Processing the resulting queue of writes is the issue with
impact.  I did not realize the writes were queued.  I assumed incorrectly
that the writes were filling a preallocated buffer node on a linked list
which would be expanded by chaining as needed.  (Hence my wondering about
how streams are backed.). Thank you for answering all of my questions in
one swoop!
On Mar 20, 2012 5:44 PM, "Ben Noordhuis" <[email protected]> wrote:

> On Tue, Mar 20, 2012 at 22:44, C. Mundi <[email protected]> wrote:
> > Hi Matt,
> >
> > You probably know better than me, but it's not obvious to me that these
> two
> > examples (both interesting) are especially similar.  For one thing, your
> > example creates a new buffer on every iteration.  My example leaves
> > allocation entirely to streams to decide when to buffer and in what size
> > chunks.
> >
> > And you example runs fast.  I modified your code to read test
> >
> > test.js
> > --------
> > power = process.argv[2];
> > count = Math.pow(2,power);
> > var a = [];
> > for (i=0;i<count;i++) {
> >   a.push(new Buffer('A'));
> > }
> > console.error(a.length);
> >
> >
> > and then I did this
> >
> > $ for ((i=0; i<=20; i+=2)); do echo '----------'; time node test.js $i;
> done
> >
> > and got  this:
> >
> > ----------
> > 1
> >
> > real    0m0.409s
> > user    0m0.164s
> > sys    0m0.020s
> > ----------
> > 4
> >
> > real    0m0.234s
> > user    0m0.156s
> > sys    0m0.020s
> > ----------
> > 16
> >
> > real    0m0.234s
> > user    0m0.144s
> > sys    0m0.036s
> > ----------
> > 64
> >
> > real    0m0.231s
> > user    0m0.152s
> > sys    0m0.024s
> > ----------
> > 256
> >
> > real    0m0.232s
> > user    0m0.132s
> > sys    0m0.052s
> > ----------
> > 1024
> >
> > real    0m0.232s
> > user    0m0.152s
> > sys    0m0.032s
> > ----------
> > 4096
> >
> > real    0m0.257s
> > user    0m0.180s
> > sys    0m0.028s
> > ----------
> > 16384
> >
> > real    0m0.362s
> > user    0m0.272s
> > sys    0m0.040s
> > ----------
> > 65536
> >
> > real    0m0.605s
> > user    0m0.464s
> > sys    0m0.080s
> > ----------
> > 262144
> >
> > real    0m1.689s
> > user    0m1.448s
> > sys    0m0.160s
> > ----------
> > 1048576
> >
> > real    0m6.444s
> > user    0m5.840s
> > sys    0m0.448s
> >
> > which is a couple orders of magnitude faster than my example for the same
> > upper limits of 2^20.
> >
> > But remember, my example was not slow to stuff the stream.  The slow part
> > was draining the stream to disk.  Now that could be because the VFS was
> > pushing back (?) against lots of tiny writes, or it could be because (?)
> > node streams are backed with a small buffer and stuffing it forced node
> to
> > scavenge for memory to create a linked list of tiny buffer.  If we look
> at
> > the onset of the scaling near 1 KB, we might imagine that stuffing with
> 1MB
> > would require ~1000 chunks to be scavenged.  That does not seem like a
> big
> > job for malloc on a machine with 2GB and basically no load.
> >
> > Let's look at your array example.  At the high end, I'm allocating and
> > tacking on a million one-byte buffers.  Yet it runs quickly.
> >
> > So I'm still curious to know what determines how the stream created to
> write
> > to disk actually drains.  That would be node code, right?  I mean, not
> part
> > of V8.
>
> Matt is mostly right. Quoting your code:
>
>    for (i=0; i<Math.pow(2,power); i++) {
>      ws.write('A');
>    }
>    ws.end();           // write EOF
>
> Two things happen here that are expensive:
>
> 1. 2^n string to buffer conversions (the string 'A' is implicitly
> converted to a buffer).
>
> 2. 2^n write requests are queued. Each request is sent to the thread
> pool. The thread pool is a bottleneck because there are (usually) only
> 4 worker threads[1][2]. File I/O is an area where Node still needs
> some optimization. :-)
>
> On a side note: a best practice when writing to streams is to call
> stream.write() until it returns false, then listen for the 'drain'
> event before you start writing again.
>
> [1] To be clear, writes are ordered. Simply increasing the size of the
> thread pool won't improve performance of a single WriteStream because
> the writes need to be serialized anyway. But if the thread pool is
> already running at full capacity, then your file I/O will suffer too.
>
> [2] dns.lookup() is affected too, it calls getaddrinfo() from the thread
> pool.
>
> --
> Job Board: http://jobs.nodejs.org/
> Posting guidelines:
> https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
> You received this message because you are subscribed to the Google
> Groups "nodejs" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/nodejs?hl=en?hl=en
>

-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Reply via email to