[nodejs] Re: Multi-threaded Node.js

Bruno Jouhier Fri, 04 Apr 2014 15:47:32 -0700

I created an issue for the callback based API: 
https://github.com/SyntheticSemantics/ems/issues/1
We can discuss the details there.


Bruno

On Saturday, April 5, 2014 12:04:28 AM UTC+2, Bruno Jouhier wrote:
>
> Hi Jace,
>
> What you call threads are actually processes because the fork call 
> creates a process.
>
> In each of these processes you have one main thread that runs node's event 
> loop. It is crucial to *never* block this thread. Blocking it may not look 
> so dramatic for computations that cooperate with other processes through 
> EMS calls but it is catastrophic if your node process is responding to 
> external events because these events won't get serviced while EMS 
> computations are blocked, waiting on empty/full conditions.
>
> The classical way to implement callback-based APIs on top of blocking APIs 
> in node is to use the thread pool. You decompose every function into 3 
> pieces:
>
>    - A *start* function that gets called from the main loop. It creates a 
>    work item and passes it to the thread pool
>    - A *worker* function that processes the work item. This function is 
>    called from the thread pool. It execute the blocking call(s).
>    - An *after *function that gets called from the main loop after the 
>    worker function has returned. This function executes the callback.
>
> This is described in https://kkaefer.com/node-cpp-modules/#threadpool and 
> this is how many node binary modules are implemented (the one I know best 
> is https://github.com/joeferner/node-oracle). This is not too hard to 
> implement if you start from a good code pattern.
>
> Bruno
> On Friday, April 4, 2014 7:35:45 PM UTC+2, [email protected]:
>>
>> Hi Bruno,
>>
>> Yes, Full-Empty transitions are blocking operations.  The underlying 
>> principle is if multiple threads are sharing data which must be updated 
>> atomically, their execution is going to be sequential anyhow.  Data which 
>> is not contended for does not have a serialization penalty, so actual 
>> blocking only occurs with highly contended data.  Also, just to clarify: 
>> all threads participate in EMS parallel loops -- there isn't a "master" 
>> thread that is blocked waiting for a parallel loop to finish (although you 
>> can make one with a conditional that inspects ems.myID).
>>
>> I was already considering non-blocking alternatives for all the EMS 
>> intrinsics, specifically "try" versions which return an object 
>> containing the return value (if needed) and an indicator of whether the 
>> operation completed or not.  After thinking about it for a day or so, I 
>> agree getting a callback upon completion of a Full/Empty operation is more 
>> in the spirit of Node, however, I'm not sure the best way of implementing 
>> that.  
>>
>> The challenge comes from the fact EMS threads do not communicate with 
>> each other except through data stored in memory, so there's no mechanism to 
>> send notification of an event from one thread to another.  The two choices 
>> for adding this mechanism are to build on Node's existing TCP 
>> messaging/event infrastructure, or to build one into EMS based on shared 
>> memory.
>>
>> The problem with the former is it can't keep up with millions of events 
>> per second, and the problem with the latter is that callbacks can only be 
>> discovered and execute when the program invokes another EMS operation, 
>> which would result in deadlocks: a thread waiting for a callback of old 
>> data can't call EMS for new data and thus never executes the callback that 
>> would unblock progress.  
>>
>> It's worth noting that EMS lets you over-subscribe the system so if you 
>> know some fraction of your threads will be blocked at any time, you can 
>> create more threads than cores to take advantage of the idle cores, 
>> remember to set pinThreads to false so threads can move to idle cores: 
>> require('ems', false).
>>
>> I'll put more thought into how to implement callbacks in EMS, if you have 
>> any ideas please let me know!
>>
>>              -J
>>
>>
>> On Thursday, April 3, 2014 10:23:35 AM UTC-7, Bruno Jouhier wrote:
>>>
>>> That's cool but if I understand well calls to EMS are blocking: they 
>>> wait on state transitions that get triggered by the processes that you 
>>> forked (the  EMStranitionFEtag calls in ems.cpp). So EMS will let you 
>>> speed up computations by running them in parallel on forked processes but 
>>> your main loop will be blocked until the computation completes (you'll 
>>> be waiting on "full" markers that get set when results become available) 
>>> :-(.
>>>
>>>
>>> Any plans to have async variants of these calls that don't block but signal 
>>> their completion through a callback?
>>>
>>>
>>> Bruno
>>>
>>> On Tuesday, April 1, 2014 7:58:54 PM UTC+2, 
>>> [email protected] wrote:
>>>>
>>>> I just published a NPM package that adds shared memory parallelism, 
>>>> Transactional Memory, and fine-grained synchronization to Node:
>>>> GitHub: SyntheticSemantics/ems<https://github.com/SyntheticSemantics/ems>
>>>> NPM: ems <https://www.npmjs.org/package/ems>  or just: npm install ems
>>>>
>>>> It may not be exactly what you're looking for, but it is effective for 
>>>> jobs too large for one core but not large enough for a scalable cluster. 
>>>>  The programming and execution model is somewhere between OpenMP 
>>>> multitasking and a Partitioned Global Address Space (PGAS) tools.  It's 
>>>> built on Node's existing fork mechanisms so all legacy code and packages 
>>>> and node distributions work normally -- only Extended Memory Semantics 
>>>> (EMS) objects are shared between threads.
>>>>
>>>>            -J
>>>>
>>>>
>>>> On Monday, February 18, 2013 6:29:28 AM UTC-8, RF wrote:
>>>>>
>>>>> It seems that my first question is answered (yes - threads-a-gogo - 
>>>>> but without allowing shared mutable objects). 
>>>>> My second question is possibly redundant, then, but whether or not 
>>>>> this is a desirable feature would appear to be debatable.
>>>>>
>>>>> For what it's worth, I think having more choices is always a good 
>>>>> thing, although I would not argue that a true multi-threaded solution 
>>>>> should be integrated into Node core given it's nature.
>>>>> The W16 project, from what I understand, is an experiment that 
>>>>> involves a modified V8 engine to allow multiple cores to be utilized 
>>>>> where 
>>>>> each core shares a single common event loop from which events are 
>>>>> assigned 
>>>>> and executed, using mutexes for synchronization issues.
>>>>>
>>>>> I think I've got what I needed to know.
>>>>> Thanks to all of you for the responses, in particular that blog post 
>>>>> by Bruno was very informative.
>>>>>
>>>>> Regards,
>>>>> -Rob
>>>>>
>>>>> On Monday, 18 February 2013 00:15:48 UTC, RF wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm CS student who is new to Node, and I have two questions:
>>>>>>
>>>>>>    1. Is there currently an existing mechanism (e.g. module, core 
>>>>>>    functionality) that allows Node applications to spawn multiple 
>>>>>> threads (to 
>>>>>>    take advantage of multiple cores for true parallelism) ?
>>>>>>    2. If not, would this be a desirable feature?
>>>>>>
>>>>>> My understanding is that Node applications use a single thread to 
>>>>>> handle everything by queuing events on an event loop to be processed 
>>>>>> sequentially.
>>>>>> I also understand that this is the core feature that allows Node to 
>>>>>> grant efficiency gains for specific types of applications, and is the 
>>>>>> (main?) source of Node's popularity.
>>>>>>
>>>>>> Given this fact then (and assuming that it's correct), it would seem 
>>>>>> counter-intuitive to enable multi-threaded functionality in Node when 
>>>>>> there 
>>>>>> are other languages/frameworks available potentially more suited to 
>>>>>> multi-threaded behavior. 
>>>>>> However, an example use case that I'm thinking of is a situation 
>>>>>> whereby an existing Node application needs to be adapted or extended 
>>>>>> with 
>>>>>> some functionality that would benefit from true parallelism.
>>>>>> So, maybe 3 or 4 threads could be created that would handle 3 or 4 
>>>>>> tasks more efficiently than Node's existing sequential behavior, while 
>>>>>> still taking advantage of Node's established execution model in other 
>>>>>> areas 
>>>>>> of the application.
>>>>>>
>>>>>> I was thinking along the lines of creating a Node module that exposes 
>>>>>> an interface for creating threads and supplying them with the necessary 
>>>>>> function (and also some mechanisms for dealing with shared data 
>>>>>> concurrency 
>>>>>> and consensus issues).
>>>>>> I have searched unsuccessfully through available resources in an 
>>>>>> attempt to answer the above questions, so I'm hoping that someone can 
>>>>>> help 
>>>>>> me out.
>>>>>>
>>>>>> Regards,
>>>>>> -Rob
>>>>>>
>>>>>

-- 
-- 
Job Board: http://jobs.nodejs.org/
Posting guidelines: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[nodejs] Re: Multi-threaded Node.js

Reply via email to