There have been fewer complaints lately, and in some ways that's a sign of a
growing community which is good, but resurrecting zombie threads with
irrelevant side points is the pet peeve
[of](https://forum.nim-lang.org/t/1263#10129)
[others](https://forum.nim-lang.org/t/21#4446). So, @jibal may want to work on
being less easily triggered or else open new threads & link back in the future.
Since @jibal did resurrect this one to the top of the index _and_ follow on to
my attempt to appease him with (self-righteous|self-deprecating),
defending-the-innocent justification, I felt a reply warranted, but will add
some hopefully interesting non-redundant information along the way.
The only thing I say incorrect in the paragraph that @jibal quoted is to
suggest `OP.fib(0)` is correct (even that is qualified by "I think" as in "I
suspect" and that slight error is already granted just above). Just from the
full text of the post @jibal quotes, I clearly explicitly took @miran's
`fib(2)` example to indicate an issue with index origin (meaning 0 1 1 2.. vs 1
1 2... or even 0 0 0 1 1 2...). `OP.fib(n)` is fine for `n >= 1`, matching the
"1 1 2 3 5.." variant after the very first slot in the F(0)=F(1)=1 case (vs
F(1)=F(2)=1 or whatever). "old school" may be wrong since Fibonacci himself
probably did F(1)=F(2)=1, but that's pretty vague. Maybe the "new school" is
the "old school", etc. It's a common enough gotcha to stay right at the
beginning of the wiki page I linked to.
Anyway, while it's true the OP boundary condition was wrong, and `fib(0)`
should be 1 there, @miran's text suggests a different issue, since in 0-origin
terms Fib(2)=2 _as does the OP code_. Sure, @miran _may_ have meant _both_ the
missing 1 _and_ the indexing shift, but, given the literal text of his reply,
my quoted paragraph still seems appropriate. Many apologies to @miran if he
knows for sure that he was offended at the time, if he can even remember his
thoughts 2 years later - surely partly why zombie thread resurrection is
disliked.
Besides resurrecting to correct my "maybe-miscorrection", @jibal's point is
_also_ irrelevant to the main question (which @miran very likely understood
from his "btw", though more allowance for in-the-moment conversational flow is
very natural and I'm not criticizing that). The main topic of this thread was
_cross-backend comparisons_. You can call the 0 1 2 3 5.. series "Fibtastic" if
you want, but Fibtastic(46) still equals 0-index-origin Fibonacci(46) and still
had a performance mystery. { Fibtastic does one less recursion making it ~1.62x
faster for large n, but that only matters for _cross-impl_ comparisons if
someone naively thought "fibtastic(46) == fibonacci0origin(46) => runtime
should be equal". }
That returns us to the main response to the actual main question to which I can
add some more information that may help someone, somewhere, someday. You
_really_ need call counts because the work done can vary - **_a lot_**
depending on what backend compilers do and that's become hard to
predict/control. For example, in my FP C program above (yes, it does 0 1 1 2 3
5.. so 46 -> 47, depending/if you care, and yeah C code in a Nim forum but the
backend has enough noise!), the time can vary tons from _many_ minor code
changes. On gcc-10.2 -O3 on Gentoo Linux x86_64, glibc-2.32-r1, compile and run
as-is and it runs in 0.137 seconds now (i7-6700k at 4.7GHz) doing only 63041957
fibonacci calls. Change the `printf` to `exit((int)fibonacci(46) & 255);` makes
it 2.3x slower and does 148876861 calls (2.36x). Instead add a global `count`
variable, `count+=1` at the top of `fibonacci` & add to the printf in
`NimMainInner` \- 4.49 seconds (32.8x) slower and 1980780474 calls ( **
_31.42x_** ). Etc., etc. Time ratios track call count well enough, but call
count varies a lot -- and it's not _just_ the C vs C++ backend and not just the
NimMainInner call structure. Many things (possibly like @adrianv's
`if`-reordering) can impact it. But what is it? I may not have explained it (or
even understood it) well in the past.
Disassembling NimMainInner (for the fast variant) you can see gcc generated
fib(42), fib(41), .. fib(36) or some such and does the (cheap) linear
combination. So, what gcc is really doing is **_" partially inlining of
recursion"_** (that's probably a better term than "unpacking" or "unrolling"..
Sorry!). In this case, that pays huge dividends because the recursion cost of
fib(42) is exponentially cheaper (base of golden ratio) than fib(46) and the
others cheaper still. gcc applies similar tactics inside the recursive
function, compounding the benefit. (You can do a similar thing just manually
expanding the recursion yourself with a Pascal's triangle.)
It's distinct from ["tail call" -> iteration style
transform](https://forum.nim-lang.org/t/2261#42062) (which, e.g., does not
change code much size based on the size of the loop) or the [remember the
"first 34"](https://forum.nim-lang.org/t/4253#26491) from compile-time which
uses more data state. A future compiler could surely inline even more levels
**_or_** compose the memoization with partial inlining to create >>30x work
variations, although 1.5 orders of magnitude is already a lot! These tricks
also do not rely upon FP arithmetic or vectorization, though that helps gcc for
me. So, this "benchmark" will only become more fraught with peril in the future
which would not be so bad except that it's so darn popular. In its 2nd life,
maybe it can become a measure of these tricks not "basic language costs" as it
tends to be now...
The optimization also _might_ apply to other recursive call scenarios that
don't have the hyper efficient alternatives Fibonacci does, although rarely
given how finicky it seems today. What has kept it so finicky? I don't know.
Ask the gcc maintainers (or the clang people why they can't do it, if that's
even still the case). I was honestly a little surprised the copy-paste of years
old code worked for me here.
Now, maybe @adrianv's results came from something else. I never said I knew for
sure, just that this was the best first thing to check. I could only say so
much with so much info and ability to reproduce. The initial ratios, both
myself @miran's inability to get the same 8.5x ratio, and @adrianv's last post
timing variations all smell like this to me. I stand by my main advice to just
**_Always Measure Calls As Well as Time_**. That's literally the most (only??)
important point in this whole thread that deserves repeating far & wide. (It's
also a good exercise to learn to compile the Nim generated C code manually to
do PGO builds anyway.) @adrianv, _if_ he's even still around Nim/the
forum/etc., may well _no longer even have access_ to the same build environment
(none of which he told us) to reproduce his found 8.5x integer code effect -
**_yet another_** reason to dislike zombie threads. I know I sure don't have
easy access to the ancient gcc-5 setup with which I first found the FP behavior
myself.
With this I've said about as much on this topic as I likely will ever care to
in one convenient link-back. Cheers to all & keep the threads fresh! ;-)