In case anyone else is curious about aedt's fibonacci benchmarks, I found an
explanation for the approx 6..8X time difference of C vs Nim (with gcc as a
backend, on Linux). It took digging into assembly to figure it out, though.
With the Nim version, the nested calls/various module boilerplate allow gcc to
unpack the top several entries of the recursion _at the call site_. It is not a
properly memoized/fully made iterative tail call, though as in aedt's editted
comment. (Both versions unpack a few levels in the fibonacci function itself).
So, really the doubly recursive call is invoked with a number 4 or 5 smaller
than the pure C implementation and so runs much faster since the running time
is exponential in N. So, this is mostly just good luck for Nim-gcc in this
highly specific benchmark, not something fundamental in the language or impls
that would generalize all that well. (I haven't figured out how to tweak the
pure C to get gcc to engage that optimization, but presumably someday the gcc
folks will do so behind the scenes or maybe there's already a magic -f flag I'm
unaware of.)