> If array index is a constant, then offset is constant too. If array index is
> a variable, than offset is product of element_size and offset. So there is a
> runtime mul in later case.
This is a very interesting point.
Given that what I'm currently doing is building a stack machine bytecode
interpreter (to replace my AST-walking interpreter) - and already seeing some
serious perfomance upgrade, I'm trying to use every trick available, so these
subtle details do matter.
Currently, my simple "stack" is pretty much like that:
#[######################################################
Constants
======================================================]#
const
MAX_STACK_SIZE = 10_000_000
#[######################################################
Types
======================================================]#
type
Stack[T] = array[MAX_STACK_SIZE,T]
Value = uint64
#[######################################################
Global variables
======================================================]#
var
MainStack* : Stack[Value] # my main stack
MSP* : int # pointer to the last element
#[######################################################
Implementation
======================================================]#
template push*(v: Value) = inc(MSP); MainStack[MSP] = v
template pop*(): Value = dec(MSP); MainStack[MSP+1]
template popN*(x: int) = dec(MSP,x)
template top*(x:int=0): Value = MainStack[MSP-x]
Run
so... normally an **ADD** instruction, in my {.computedGoto.} interpreter loop,
would be something like that:
case OpCode
# other cases
of ADD_OP: push(pop()+pop()); inc(ip)
# inc(MSP); MainStack[MSP] = ((dec(MSP); MainStack[MSP+1]) + (dec(MSP);
MainStack[MSP+1]))
# ...
Run
which I'm optimizing further (I think... lol) by doing it like:
case OpCode
# other cases
of ADD_OP: top(1) = top(0)+top(1); dec(MSP); inc(ip)
# MainStack[MSP-1] = MainStack[MSP]+MainStack[MSP-1]; dec(MSP)
# ...
Run
So... lots of different things going on...
Any ideas to make it better (and more performant) are more than welcome! :)