@enaaaab450 \- WELCOME! Also, 3 things:
1. PGO can also be very useful (2..3X) - or not at all or even hurt! - and may be worth a try: <https://forum.nim-lang.org/t/6295> 2. To expand on why recursive Fibonacci is a terrible benchmark, which it absolutely is @Araq is 100% right, some backend compilers can partially inline the recursion which in the case of recursive Fibonacci makes for exponential performance sensitivity (1.62^n) to how much inlining can happen. To compound the problem (and explain why it may/may not be exceptions or get better with C++), backend compiler optimizer analyzers are [very finicky](https://forum.nim-lang.org/t/4253#26492) about the exact shape of the code to engage this optimization. Unless you are doing [gprof](https://www.man7.org/linux/man-pages/man1/gprof.1.html) -style call count programming you have not known how much actual work recursive Fibonacci is doing for something like 10..15 years. On modern CPUs the number of funcalls could be off by like 32X. If you could steer the amount of inlining, you could tune that to almost anything. { This bad benchmark will live on forever, as they all do, just as I was trying to understand literally 50 year old Fortran code yesterday. I think people underestimate how much `f2py` made Python take off for numerical work. } 3. If you care enough about timing precision to do 10 warm-up runs, you probably will get better accuracy for the happy path/hot cache time via [bu/tim.nim](https://github.com/c-blake/bu/blob/main/tim.nim) than hyperfine. Specifically, the minimum time is the least contaminated by vulnerability to the huge network of queues and caches any modern CPU/timesharing OS has. For A/B time comparison one does want an error bar - so you want _some_ error estimate on the estimator of the minimum; `tim` just uses repeated runs for that. { Or else you probably don't want a summary number at all, but rather _the whole distribution function_ to display aforementioned vulnerability, or at least 20 or more quantiles; A good estimate of that requires far more data than the minimum and may also be very hard to make reproducible, given its sensitivity to the entire state of a test system. }
