Thanks for the quick feedback, I appreciate it. I had indeed somehow missed Julia's arrays being column-major and taking that into account essentially yields equal performance between Numba and Julia in this benchmark. I'll compare it to Fortran/C++ just to see how large the difference is, but the current performance does seem to be the limit of LLVM-based approaches in general.
To answer Tim Holy's point, Numba does do SIMD vectorization, but I haven't tested it extensively in the latest releases. For now, I'll port some of the more compact simulations I'm doing to Julia and Python/Numba for a less trivial benchmark and see how the performance compares then.
