On Wed, 2007-08-22 at 13:36 -0500, Matt Campbell wrote:
> Nicolas Cannasse wrote:
> > Yes, that would be possible.
>
> So do you think it would be desirable to have coroutines at hte VM
> level? I see this dilemma with VM-level coroutines: If you implement
> them as Lua does, with only a VM stack per coroutine, then the
> implementation is portable and has low memory usage per coroutine but
> can't be used with JIT; but if you implement them as LuaJIT and the Io
> language do, with a CPU-level stack per coroutine, then this requires
> platform-specific magic and (I think) has higher memory usage per
> coroutine. By implementing coroutines in a higher-level compiler and a
> runtime library, as JavaScript Strands does, this dilemma would be
> avoided. Maybe haXe could have optional coroutines; I would make this
> an option configurable at compile time, because coroutines as
> implemented by JS Strands would probably introduce unacceptable overhead
> in some cases.
Felix uses the higher level compiler trick, and generates high
performance machine binaries. There's no reason for coroutines
to impose any particularly onerous overhead.
A JIT for a VM would not be using the machine stack for
user data anyhow: it would be using the machine stack only
for transient temporary values. The JIT has to respect the
data structures used by VM-emulation (otherwise JIT generated
machine code won't interoperate with interpreted code).
Coroutines use hardly any store. In Felix:
channel: one bit flag + linked list of threads
= one word cost
thread: pointer to current continuation:
= one word cost
linked list of threads: two words per node
scheduler_queue: linked_list of threads
continuation:
caller return continuation: one word
current frame address: one word
service request pointer: one word
There is extra store in the continuation object for every
piece of data it needs to access (i.e. local variables).
Threads communicate on channels, using global data,
or via pointers passed as arguments when they're constructed.
Each continuation consists of flat code with only gotos,
PLUS two special instructions:
CALL: returns a new continuation to execute
RETURN: returns the callers continuation
which the driver code executes by simply resuming
the returned object (until NULL is returned).
Although Felix generates C++ classes to do all this, and
has a lot of sophisticated optimisations to eliminate most
of the CALL/RETURN overhead (eg inlining), a VM could use
a simple mechanical layout, and the JIT would respect this
too: the JIT doesn't handle thread interleaving, it just
generates the flat code of the continuation body.
In particular note that CALL and RETURN work by *returning*
control, and so can ONLY be used when the machine stack is
empty. Instead, the stack consists of heap allocated objects
linked together, which the GC cleans up.
This mechanism has been tested on my AMD64 3200 1G box,
and can handle in excess of a million threads with a transaction
rate over 500K transactions per second.
A VM implementation should be just as fast IMHO .. since the
Felix machine code generated is just machine code following
an abstract protocol anyhow. The only difference for Neko would
be you'd be emulating the C++ with C eg the 'resume()' method
of each continuation would be a pointer (instead of using C++
generated virtual table dispatch).
In the JIT this pointer would be to generated machine code.
In the VM emulation, it would be a pointer to a thunk that
invokes the VM on the remaining data which would be bytecode.
(or something similar).
--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net
--
Neko : One VM to run them all
(http://nekovm.org)