I have done extensive benchmarking of various dispatching techniques in Nim
with a toy 7 instructions VM.
Results are the following:
# interp_switch took 8.604712000000003s for 1000000000 instructions:
116.2153945419672 Mips (M instructions/s)
# interp_cgoto took 7.367597000000004s for 1000000000 instructions:
135.7294651159665 Mips (M instructions/s)
# interp_ftable took 8.957571000000002s for 1000000000 instructions:
111.6374070604631 Mips (M instructions/s)
# interp_handlers took 11.039072s for 1000000000 instructions:
90.58732473164413 Mips (M instructions/s)
# interp_methods took 23.359635s for 1000000000 instructions:
42.80888806695823 Mips (M instructions/s)
Run
@Araq is right, the main advantage of computed gotos is to better use the
hardware indirect branch predictor if your case statement is done in a loop.
Using a table instead would be a guaranteed cache miss.
Besides the indirect branch predictor there are also the following hardware
predictors:
* Linear or straight-line code
* Conditional branches
* Calls and Returns
In assembly, computed gotos generates a jump, but if we could generate call and
ret instead (without pushing and popping function parameters on the stack) we
could get even faster speed, see the [Context Threading
section](https://github.com/status-im/nimbus/wiki/Interpreter-optimization-resources#context-threading)
.