[nodejs] Re: Multi-threaded Node.js

mogill Sat, 05 Apr 2014 18:52:49 -0700

Bruno,

What you call threads are actually processes because the fork call creates 
> a process.
>


That's true, I will change "thread" to "process".  Originally I used the 
term "node" as a pun on using Node.js and eventually decided it was too 
confusing to run many nodes of Node on a node and changed it to "thread". 
 The term "Process" best describes what's happening, though.

 

> In each of these processes you have one main thread that runs node's event 
> loop. It is crucial to *never* block this thread. Blocking it may not look 
> so dramatic for computations that cooperate with other processes through 
> EMS calls but it is catastrophic if your node process is responding to 
> external events because these events won't get serviced while EMS 
> computations are blocked, waiting on empty/full conditions.
>

I was aware of this but convinced myself the primary use cases would be 
batch jobs that were big enough to merit parallelization, not inside an 
interactive application.  Considering one of the first comments about EMS 
is an interactive use case, I'm willing to believe that assumption was 
wrong.  

EMS is a memory model, it's agnostic to the source of parallelism and 
should work with thread pools, async tasks, any number or kind of forked 
processes, processes written in other languages, in any combination.  The 
parallelism built into EMS is essentially a pool of threads with loop-level 
and BSP interfaces which are more idiomatic for homogeneous parallelism 
than explicitly managing threads.

Having said that, heterogenous parallelism is typical in web apps and that 
must be managed while updating shared state that must be kept coherent, so 
EMS can be still be beneficial even though it's not a batch job with dozens 
of threads and gigabytes of data.


 

> The classical way to implement callback-based APIs on top of blocking APIs 
> in node is to use the thread pool. You decompose every function into 3 
> pieces:
>

I'm concerned that creating a thread for every memory operation that needs 
a callback will overwhelm the system.  The per-element tags use very little 
memory which means synchronization does not need to be rationed, but EMS 
callbacks would be.  An EMS array with 1,000 empty elements that were 
eventually going to be populated in some unknown order requires 1,000 
outstanding readFEwithCB(idx, callback) calls, which would be 1,000 threads 
in addition to the ones executing.  A per-process callback queue might 
help, but I need to think more about gracefully handling thousands of 
outstanding operations.

Also, thanks for opening the callback issue at GitHub, I'll follow up on 
that over there.

               -J

 

>
>    - A *start* function that gets called from the main loop. It creates a 
>    work item and passes it to the thread pool
>    - A *worker* function that processes the work item. This function is 
>    called from the thread pool. It execute the blocking call(s).
>    - An *after *function that gets called from the main loop after the 
>    worker function has returned. This function executes the callback.
>
> This is described in https://kkaefer.com/node-cpp-modules/#threadpool and 
> this is how many node binary modules are implemented (the one I know best 
> is https://github.com/joeferner/node-oracle). This is not too hard to 
> implement if you start from a good code pattern.
>
> Bruno
> On Friday, April 4, 2014 7:35:45 PM UTC+2, [email protected]:
>>
>> Hi Bruno,
>>
>> Yes, Full-Empty transitions are blocking operations.  The underlying 
>> principle is if multiple threads are sharing data which must be updated 
>> atomically, their execution is going to be sequential anyhow.  Data which 
>> is not contended for does not have a serialization penalty, so actual 
>> blocking only occurs with highly contended data.  Also, just to clarify: 
>> all threads participate in EMS parallel loops -- there isn't a "master" 
>> thread that is blocked waiting for a parallel loop to finish (although you 
>> can make one with a conditional that inspects ems.myID).
>>
>> I was already considering non-blocking alternatives for all the EMS 
>> intrinsics, specifically "try" versions which return an object 
>> containing the return value (if needed) and an indicator of whether the 
>> operation completed or not.  After thinking about it for a day or so, I 
>> agree getting a callback upon completion of a Full/Empty operation is more 
>> in the spirit of Node, however, I'm not sure the best way of implementing 
>> that.  
>>
>> The challenge comes from the fact EMS threads do not communicate with 
>> each other except through data stored in memory, so there's no mechanism to 
>> send notification of an event from one thread to another.  The two choices 
>> for adding this mechanism are to build on Node's existing TCP 
>> messaging/event infrastructure, or to build one into EMS based on shared 
>> memory.
>>
>> The problem with the former is it can't keep up with millions of events 
>> per second, and the problem with the latter is that callbacks can only be 
>> discovered and execute when the program invokes another EMS operation, 
>> which would result in deadlocks: a thread waiting for a callback of old 
>> data can't call EMS for new data and thus never executes the callback that 
>> would unblock progress.  
>>
>> It's worth noting that EMS lets you over-subscribe the system so if you 
>> know some fraction of your threads will be blocked at any time, you can 
>> create more threads than cores to take advantage of the idle cores, 
>> remember to set pinThreads to false so threads can move to idle cores: 
>> require('ems', false).
>>
>> I'll put more thought into how to implement callbacks in EMS, if you have 
>> any ideas please let me know!
>>
>>              -J
>>
>>
>> On Thursday, April 3, 2014 10:23:35 AM UTC-7, Bruno Jouhier wrote:
>>>
>>> That's cool but if I understand well calls to EMS are blocking: they 
>>> wait on state transitions that get triggered by the processes that you 
>>> forked (the  EMStranitionFEtag calls in ems.cpp). So EMS will let you 
>>> speed up computations by running them in parallel on forked processes but 
>>> your main loop will be blocked until the computation completes (you'll 
>>> be waiting on "full" markers that get set when results become available) 
>>> :-(.
>>>
>>>
>>> Any plans to have async variants of these calls that don't block but signal 
>>> their completion through a callback?
>>>
>>>
>>> Bruno
>>>
>>> On Tuesday, April 1, 2014 7:58:54 PM UTC+2, 
>>> [email protected] wrote:
>>>>
>>>> I just published a NPM package that adds shared memory parallelism, 
>>>> Transactional Memory, and fine-grained synchronization to Node:
>>>> GitHub: SyntheticSemantics/ems<https://github.com/SyntheticSemantics/ems>
>>>> NPM: ems <https://www.npmjs.org/package/ems>  or just: npm install ems
>>>>
>>>> It may not be exactly what you're looking for, but it is effective for 
>>>> jobs too large for one core but not large enough for a scalable cluster. 
>>>>  The programming and execution model is somewhere between OpenMP 
>>>> multitasking and a Partitioned Global Address Space (PGAS) tools.  It's 
>>>> built on Node's existing fork mechanisms so all legacy code and packages 
>>>> and node distributions work normally -- only Extended Memory Semantics 
>>>> (EMS) objects are shared between threads.
>>>>
>>>>            -J
>>>>
>>>>
>>>> On Monday, February 18, 2013 6:29:28 AM UTC-8, RF wrote:
>>>>>
>>>>> It seems that my first question is answered (yes - threads-a-gogo - 
>>>>> but without allowing shared mutable objects). 
>>>>> My second question is possibly redundant, then, but whether or not 
>>>>> this is a desirable feature would appear to be debatable.
>>>>>
>>>>> For what it's worth, I think having more choices is always a good 
>>>>> thing, although I would not argue that a true multi-threaded solution 
>>>>> should be integrated into Node core given it's nature.
>>>>> The W16 project, from what I understand, is an experiment that 
>>>>> involves a modified V8 engine to allow multiple cores to be utilized 
>>>>> where 
>>>>> each core shares a single common event loop from which events are 
>>>>> assigned 
>>>>> and executed, using mutexes for synchronization issues.
>>>>>
>>>>> I think I've got what I needed to know.
>>>>> Thanks to all of you for the responses, in particular that blog post 
>>>>> by Bruno was very informative.
>>>>>
>>>>> Regards,
>>>>> -Rob
>>>>>
>>>>> On Monday, 18 February 2013 00:15:48 UTC, RF wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm CS student who is new to Node, and I have two questions:
>>>>>>
>>>>>>    1. Is there currently an existing mechanism (e.g. module, core 
>>>>>>    functionality) that allows Node applications to spawn multiple 
>>>>>> threads (to 
>>>>>>    take advantage of multiple cores for true parallelism) ?
>>>>>>    2. If not, would this be a desirable feature?
>>>>>>
>>>>>> My understanding is that Node applications use a single thread to 
>>>>>> handle everything by queuing events on an event loop to be processed 
>>>>>> sequentially.
>>>>>> I also understand that this is the core feature that allows Node to 
>>>>>> grant efficiency gains for specific types of applications, and is the 
>>>>>> (main?) source of Node's popularity.
>>>>>>
>>>>>> Given this fact then (and assuming that it's correct), it would seem 
>>>>>> counter-intuitive to enable multi-threaded functionality in Node when 
>>>>>> there 
>>>>>> are other languages/frameworks available potentially more suited to 
>>>>>> multi-threaded behavior. 
>>>>>> However, an example use case that I'm thinking of is a situation 
>>>>>> whereby an existing Node application needs to be adapted or extended 
>>>>>> with 
>>>>>> some functionality that would benefit from true parallelism.
>>>>>> So, maybe 3 or 4 threads could be created that would handle 3 or 4 
>>>>>> tasks more efficiently than Node's existing sequential behavior, while 
>>>>>> still taking advantage of Node's established execution model in other 
>>>>>> areas 
>>>>>> of the application.
>>>>>>
>>>>>> I was thinking along the lines of creating a Node module that exposes 
>>>>>> an interface for creating threads and supplying them with the necessary 
>>>>>> function (and also some mechanisms for dealing with shared data 
>>>>>> concurrency 
>>>>>> and consensus issues).
>>>>>> I have searched unsuccessfully through available resources in an 
>>>>>> attempt to answer the above questions, so I'm hoping that someone can 
>>>>>> help 
>>>>>> me out.
>>>>>>
>>>>>> Regards,
>>>>>> -Rob
>>>>>>
>>>>>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[nodejs] Re: Multi-threaded Node.js

Reply via email to