On Fri, 2007-08-24 at 09:44 +0200, Nicolas Cannasse wrote:
> > Felix uses the higher level compiler trick, and generates high
> > performance machine binaries. There's no reason for coroutines
> > to impose any particularly onerous overhead.
> 
> I haven't been studying the pro or cons of high and low level approachs
> so I'm not qualified to compared how the two would apply to Neko.
> However, I tend to favor highlevel approaches when they are possible
> since they are less tricky to implement and more crossplatform.

I haven't studied Neko so I can only guess what it does :)

> > A JIT for a VM would not be using the machine stack for
> > user data anyhow: it would be using the machine stack only
> > for transient temporary values. The JIT has to respect the
> > data structures used by VM-emulation (otherwise JIT generated
> > machine code won't interoperate with interpreted code).
> 
> Well, this can be discussed.
> 
> Currently the Neko JIT is using the VM stack, but it could also use the
> C stack instead, and only push back values on the VM stack when calling
> a bytecode primitive.
> 
> This is possible because JIT is based on a per-function basis, and
> functions can't access outside of their local stack in Neko.

Where does the callers "return address" go when you call a function?

Felix does this in effect:

struct con_t { 
        int pc; 
        con_t *caller; 
        arg_t arg;

        con_t *call(con_t *caller_a, args) : 
                pc(0), caller(caller_a), arg(arg_a) { return this;}

        con_t *resume() {       
                switch (pc) {
                case 0:
                        initially ..
                        // SUBROUTINE CALL 
                        pc = 1; // return address
                        return new subroutine()-> call (this,args);
                case 1:
                        ...
                        // RETURN
                        return caller;
                }
        }
}

and the driver looks like:

        con_t *p = new root_proc()->call(NULL);
        while(p) p = p->resume();

What this does is maintain the call stack on the heap.
Each object is a continuation, pointing to its caller stack frame.

With gcc, a computed goto on a machine address is used instead 
of the switch on an integer case number.

So, switching context is a matter of switching the pointer p
between all the current continuations (threads = coroutines).

Felix uses 'service calls' and a service call flag in addition
to the above to escape the while loop above, but I won't
describe that. Suffice it to say, exchange of control is managed
by the client issuing service calls to read and write channels.
[Note .. a service "call" is of course actually a yield .. :]

An unbalanced read or write on a channel pops the requesting
thread off the scheduler list, whereas a balanced one pushes
both the reader(writer) and matching writer(reader) onto
the scheduler list).

Channels are created by the application and owned by 
whoever has a pointer to them. If all the owners of a channel
are suspended waiting for service, then none will ever come.
And, there are no pointers to these routines (threads).

So the garbage collector just disposes of them.. it's very cool.

The system cannot deadlock in the sense that necessarily locked
up threads (hanging on channels that cannot ever be serviced because
no one else knows them) are unreachable.

Of course starvation is possible, there is absolutely no
fairness, there is little need for locking, but the cooperating
threads do have to run on a single processor.

The same mechanism can (and often is) used for any VM which has
a VM stack: just swap the stack. This would work for machine
stacks too, except:

        (a) swapping machine stacks is unreliable, except
                by using native pre-emptive threading

        (b) machine stacks on OS like Unix have bounded
                pre-allocated address space (runs out
                fast on 32 bit machine)

As you can see, the principal constraint for the emulated
stack is simply that the machine stack is empty at the point
control is yielded to the driver. This is almost invariably
the case in a bytecode interpreter anyhow .. unless primitives
invoke it directly and recursively. In that case, you need to
arrange for a service call to spawn the invocation instead.

BTW: the swapping of 'call' with 'yield' is known as 'control
inversion' because the master/slave caller/callee relationship
is inverted. The driver loop 'reinverts' control to effect the
intended semantics.

Almost all (bytecode) interpreters control invert already.
The magic of Felix is that it makes high performance machine
code do it transparently.

But the bottom line for Neko is that it shouldn't be all
that hard to implement for the bytecode interpreter/VM,
and the JIT shouldn't be any major hassle. The big PLUS for
bytecode/VM compared with native code is that there is no
*extra* overhead in effecting the control inversion,
since bytecode interpretation already requires primitives
to return on a piecemeal basis.

A VM -- and Ocaml bytecode interpreter does this -- can
also context switch and yield points "pre-emptively" either
by an emulated timer (in Python, every 3 bytecode instructions),
or driven by actual OS pre-emptions (as in Ocaml VM) if they're
available.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

-- 
Neko : One VM to run them all
(http://nekovm.org)

Reply via email to