In case anyone else is curious about aedt's fibonacci benchmarks, I found an 
explanation for the approx 6..8X time difference of C vs Nim (with gcc as a 
backend, on Linux). It took digging into assembly to figure it out, though. 
With the Nim version, the nested calls/various module boilerplate allow gcc to 
unpack the top several entries of the recursion _at the call site_. It is not a 
properly memoized/fully made iterative tail call, though as in aedt's editted 
comment. (Both versions unpack a few levels in the fibonacci function itself). 
So, really the doubly recursive call is invoked with a number 4 or 5 smaller 
than the pure C implementation and so runs much faster since the running time 
is exponential in N. So, this is mostly just good luck for Nim-gcc in this 
highly specific benchmark, not something fundamental in the language or impls 
that would generalize all that well. (I haven't figured out how to tweak the 
pure C to get gcc to engage that optimization, but presumably someday the gcc 
folks will do so behind the scenes or maybe there's already a magic -f flag I'm 
unaware of.)

Reply via email to