I can't find it right now, and I might be mistaken, but IIRC Mike Pall (of 
LuaJIT fame) managed to make this call/ret trickery obsolete in the LuaJIT2 
interpreter, by pipelining the decoding of the next instruction to overlay with 
the fetching of the next address, along the lines described in 
[https://nominolo.blogspot.com/2012/07/implementing-fast-interpreters.html](https://nominolo.blogspot.com/2012/07/implementing-fast-interpreters.html).

In my opinion, if you have to resort to machine code (which the context 
threading solution does), then you might as well spend a little more to overlay 
the operations; it will be obsolete within 10 years either way thanks to 
architecture differences. Their solution does carry more easily to more 
architectures, I'll give them that, but the question is "how fast can you go", 
not "how fast can you go for only 400 implementation lines".

Reply via email to