Thanks for engaging on this.

That does make sense, and yeah, my model for when JS code runs in Node is 
basically "in response to some asynchronous event": which could be I/O, or 
also things like setTimeout or nextTick (or the timers module). The node 
process will be asleep in the libuv event loop, wake up in order to run 
some callback, and the code in that callback will get to run until it's 
done (including anything it calls synchronously, which doesn't require I/O 
or get deferred via nextTick or timers); it shouldn't get preempted; 
eventually this sequence of synchronous execution will be done and Node 
will find something else to do (another timer or event handler) or go back 
to sleep.

Upon more experimentation, I don't think this has anything to do with 
node-fibers (more on this below); I just mentioned that because we are 
using it and it might affect things, but I no longer think it's relevant 
here. I should also have mentioned we're using node-weak 
<https://github.com/TooTallNate/node-weak>, which is going to be relevant 
here (it's one way but not the only way of getting the behavior in 
question). What this boils down to is, I have a fundamental question about 
the design and intended use of MakeCallback, whether I'm right about the 
behavior I'm going to describe, and whether this is intentional.

I think I can further boil down my understanding of Node's asynchronous 
concurrency model and consistency guarantees and need for locking (or not) 
to: "no preemption". At least, that's my read on what people want from 
Node, and what I want from Node. (Following on what I said before about 
every synchronous invocation being a critical section until it chooses to 
yield: State can't change out from under you in the middle of one 
synchronous invocation. If you yield and then resume execution, "on the 
other side of an asynchronous I/O operation" as you said, state can and 
does change out from under you, because if code A schedules callback B, it 
can't know what will run in between A and B. And that's fine, because we 
all know it works that way, and B can't make any assumptions about the 
state of the world being exactly as A left it. But each statement in A 
should be able to assume that the state of the world is exactly as the 
previous statement left it.)

This "no preemption" property would require that if I'm running some 
synchronous code, it gets to decide what other code runs, until it decides 
to yield execution by returning to the event dispatcher. In the middle of 
what I'm calling a flow of synchronous execution, some other piece of code 
it doesn't know about can't jump on the CPU, change some data structures, 
and return control to the first code in an inconsistent state. For this to 
be true, we should only do event dispatch (calling callbacks) from the 
toplevel event loop; there shouldn't be contexts nested within arbitrary 
application code that can dispatch into other arbitrary code.

MakeCallback is where Node invokes nextTick callbacks, and it is called 
from the toplevel event loop (well, right before returning to the libuv 
event loop), but if my understanding is correct, it's also called on every 
transition from JS code back to C++ code -- which is a superset of 
"returning to the libuv event loop". That is, if "outer" JS code calls into 
a native-code Node extension, and that native-code extension calls back 
into "inner" JS via MakeCallback, when that inner JS returns to C++, 
MakeCallback invokes process._tickCallback before returning, and the outer 
JS has been preempted by any nextTick callbacks that get invoked here, and 
it can't really reason about the state of the world (specifically, any 
consistency guarantees on shared data structures) after that. (Side 
question here: is MakeCallback *a* way for extension/C++ code to invoke JS 
code, or *the* way for extension/C++ code to invoke JS code? That is, from 
a native-code extension that wants to invoke JS code, is all JS code 
"callbacks" or only some of it?)

This, to me, seems to break the entire Node concurrency model, and I'm 
surprised more people haven't been bitten by it, but maybe I'm 
misunderstanding things, maybe there's a good reason for it, maybe 
MakeCallback isn't supposed to get used this way, or maybe it's just a bug 
that's enough of a corner case that it hasn't been noticed.

Let me restate the problematic flow, and then I'll get into examples. A 
normal Node app might start off as pure JS; outside Node's own codebase all 
the app code is JS running as event handlers, and the flow is (1) event 
loop -> (2) handlers in JS -> return to (1). Then you add some native-code 
Node extensions, and JS code can call out to C++ code in these extensions, 
and now the flow is (1) event loop -> (2) handlers in JS -> (3) callouts in 
C++ extension -> return to (2) -> return to (1). And say some of these 
native-code Node extensions want to execute JS and use MakeCallback to do 
it (again, I don't know that this is the only way native code gets to 
execute JS code, but I do know it's one way); and now the flow is (1) event 
loop -> (2) handlers in JS -> (3) callouts in C++ extension -> (4) more JS 
code -> return to (3) -> return to (2) -> return to (1). And obviously more 
complex flows are possible; there could be even deeper nested transitions 
between JS and native code, or bounce between (2)<>(3) multiple times 
before returning to (1), but the 1234321 case is already enough to 
illustrate the preemption problem. Every time we invoke MakeCallback, on 
the way out it executes process._tickCallback. This happens at the boundary 
from (2) back to (1), which is not a problem, but also at the boundary from 
(4) back to (3), which is a problem.

Here's a demo of the problem 
<https://gist.github.com/metamatt/99ef2a5e12de2acd5e39>. (Note that this 
doesn't use node-fibers. It does use node-weak, but just as a way of 
forcing invocation of MakeCallback. Any other extension that calls 
MakeCallback could exhibit the same behavior.)

Here are examples of node-fibers usage, including examples of what the call 
stacks look like <https://gist.github.com/metamatt/2465dfd76213a98e22e0>. 
(Again, not relevant to the preemption issue I'm harping on; just following 
up on what would otherwise be a loose end from my original post, if 
anyone's curious. The point is that the stack trace inside a fiber doesn't 
see outside the fiber boundary, unless you write your own stack-stitching 
code, so you don't see _tickCallback in the call stack even if it was used 
to invoke the fiber. But you still wouldn't expect to see _tickCallback 
higher in the call stack.) Using fibers, one certainly ends up with a 
different concurrency model than the normal Node one, and you can bring all 
sorts of problems on yourself this way if you do it wrong, but I don't 
consider it to break my "no preemption" rule because it's entirely under 
your control; you can get descheduled and then find the world has changed 
out from under you when you get rescheduled, but you actually have to ask 
for this to happen by calling Fiber.yield or Fiber.run.

Boiling all this back down to one question, it's this: is it intentional 
that inner invocations of MakeCallback do dispatch to _tickCallback (thus 
preempting any code that was running higher on the call stack), or can this 
be changed? IMHO it would be preferable, and would solve this problem, if 
we called process._tickCallback only from the very outermost transition 
from JS back to C++, not on every such transition. But maybe this causes 
other problems I'm not seeing. Or, should extensions that are invoked as 
callouts from JS code (as opposed to registering event handlers, which 
seems to be how the Node source uses MakeCallback) not be using 
MakeCallback in the first place?

As things stand, it seems like any extension that uses MakeCallback should 
be labeled "somewhat dangerous", because it can preempt the code that calls 
it. (But if this is documented and advertised properly, I as a client of 
that extension can code around it; I just have to realize that certain 
entry points into the extension are a yield/redispatch point from the point 
of view of the calling code, and if I know where those are, I can code 
around it.) Meanwhile node-weak (and I don't mean to malign it, it's really 
cool and useful functionality) should be labeled "quite a lot more 
dangerous", because not only can it cause preemption for those same 
reasons, but you can't document any specific entry points or surface area 
that can cause that preemption: that surface area is GC, which can happen 
at any time.

thanks,

Matt

-- 
Job board: http://jobs.nodejs.org/
New group rules: 
https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: 
https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
--- 
You received this message because you are subscribed to the Google Groups 
"nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/nodejs/b045619f-86a5-4476-b542-885425347cdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to