On Wed, Mar 23, 2016 at 12:25 PM, Harry Simons <simonsha...@gmail.com>
wrote:

> Hello,
>
> I have not been able to see the following points addressed in all the
> online material I have read to date on Node, and so, hope to be enlightened
> by some very smart and knowledegable folks here that I presume would be
> reading this.
>

They probably have been, but I'll try and address them here for the benefit
of the community.


> 1. Since I/O happens asynchronously in worker threads, it is possible for
> a single Node process to quickly/efficiently accept 1000s of incoming
> requests compared to something like Apache. But, surely, the outgoing
> responses for each of those requests will take their own time, won't it?
>

Of course. There's no free lunch.


> For example, if an isolated and primarily an I/O bound request takes, say,
> 3 seconds to get serviced (with no other load on the system), then if
> concurrently hit with 5000 such requests, won't Node take *a lot* of time
> to service them all, *fully*?
>

What is taking 3 seconds? The answer, as with all technology is "it
depends". If you block the CPU for 3 seconds then yes of course, your app
will suck. If you're just sitting waiting on other I/O (e.g. a network
request) for 3 seconds, then lots can happen in the gaps.


> If this 3-second task happens to involve exclusive access to the disk,
> then it would take 5000 x 3 sec = 15000 seconds, or over 4 hours of wait to
> see the response for the last request coming out of the Node app. In such
> scenarios, would it be correct to claim that a single-process Node
> configuration can 'handle' 1000s of requests per second (granted, a
> thread-server like Apache would do a lot worse with 5000 threads) when all
> that Node may be doing is simply putting the requests 'on hold' till they
> get *fully* serviced instead of rejecting them outrightly on initial
> their arrival itself? I'm asking this because as I'm reading up on Node,
> I'm often hearing how Node can address the C10K problem without any
> co-mention of any specific application setups or any specific application
> types that Node can or cannot handle... other than the broad, CPU- vs
> I/O-bound type of application classification.
>

I think you've just generally misread a lot of stuff about this, honestly.
Disk I/O is "complicated" in node (because async I/O to disk is complicated
in operating systems, it's not Node's fault). But not many web apps use the
"fs" module on their requests directly. Node uses a thread pool for the
filesystem requests on Unix-like OSs, so there are limits there, but it's
very rare to see that as an issue for developing node apps at scale. When
you talk to any of the DB modules you're using network I/O in Node, not
filesystem I/O.


> 2. What about the context switching overhead of the workers in the
> worker-thread pool? If C10K requests hit a Node-based application, won't
> the workers in the worker-thread pool end up context-switching just as much
> as the user threads in the thread pool of a regular, threaded-server (like
> Apache)...?
>

No, because most of Node isn't threaded. Only a few parts of Node use a
thread pool. Any network I/O uses native OS async methods (epoll, kqueue,
and whatever Windows uses these days). So there's zero context switching
overhead - you're entirely in userspace.


> because, all that would have happened in Node's event thread would be a
> quick request-parsing and request-routing, with the remainder (or, the
> bulk) of the processing still happening in the worker thread? That is, does
> it really matter (as far as minimization of thread context-switching is
> concerned) whether a request/response is handled from start to finish in a
> single thread (in the manner of threaded-server like Apache), or whether it
> happens transparently in a Node-managed worker thread with only minimal
> work (of request parsing and routing) subtracted from it? Ignore here the
> simpler, single-threaded user model of coding that comes with an evented
> server like Node.
>

This is why you need to read up on the C10K docs more - there's far too big
an overhead in moving from kernel to user space. A problem that threaded
servers suffer from. That's why even Apache offers the "event" mpm, why
nginx is so much faster than Apache, why Whatsapp wrote their system in
Erlang (which uses an event loop like Node, but offers some very nice
scaling tools that Node doesn't on top of that). There's reasons these
things scale better, and it's because the OS sucks at managing data between
multiple processes (threads, processes, whatever - call them what you want).


> 3. If the RDBMS instance (say, MySQL) is co-located on the Node server
> box, then would it be correct to classify a database CRUD operation as a
> pure I/O task? My understanding is, a CRUD operation on a large, relational
> database will typically involve heavyduty CPU- and I/O-processing, and not
> just I/O-processing. However, the online material that I've been reading
> seem to label a 'database call' as merely an 'I/O call' which supposedly
> makes your application an I/O-bound application if that is the only the
> thing your application is (mostly) doing.
>

Here you need to understand the performance difference between disk (even
SSD) and CPU. It's several orders of magnitude. CPU processing can take too
long, but don't code your software that way unless you can't help it. When
you can't help it, make sure you use something (like a queueing system)
that can deal with that while letting other stuff run.


> 4. A final question (related to the above themes) that may require
> knowledge of modern hardware and OS which I am not fully up-to-date on. Can
> I/O (on a given I/O device) be done in parallel, or even concurrently if
> not parallelly, and THUS, scale proportionally with user-count?
>

That depends. A single disk? No of course not. It has a fixed rotation
speed. An array of disks? Maybe. A disk with a cache? Possibly. See how
complex this question gets?


> Example: Suppose I have written a file-serving Node app that serves files
> from the local hard-disk, making it strongly an I/O-bound app.
>

Assuming no caching. But why would you do that if you want to serve
thousands of clients?

I think that pretty much answers the rest of your question, so I didn't add
further answers.

Matt.

-- 
Job board: http://jobs.nodejs.org/
New group rules: 
https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to nodejs+unsubscr...@googlegroups.com.
To post to this group, send email to nodejs@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/nodejs/CAPJ5V2ZuK3W06TUg8jMBXP6Uei1OVad-qJORR2m%3DUiZyR9FVuA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to