orc mm slower than markandsweep in my experience

cblake Wed, 17 Jan 2024 06:45:09 -0800

@enaaaab450 \- WELCOME!

Also, 3 things:


  1. PGO can also be very useful (2..3X) - or not at all or even hurt! - and 
may be worth a try: <https://forum.nim-lang.org/t/6295>
  2. To expand on why recursive Fibonacci is a terrible benchmark, which it 
absolutely is @Araq is 100% right, some backend compilers can partially inline 
the recursion which in the case of recursive Fibonacci makes for exponential 
performance sensitivity (1.62^n) to how much inlining can happen. To compound 
the problem (and explain why it may/may not be exceptions or get better with 
C++), backend compiler optimizer analyzers are [very 
finicky](https://forum.nim-lang.org/t/4253#26492) about the exact shape of the 
code to engage this optimization. Unless you are doing 
[gprof](https://www.man7.org/linux/man-pages/man1/gprof.1.html) -style call 
count programming you have not known how much actual work recursive Fibonacci 
is doing for something like 10..15 years. On modern CPUs the number of funcalls 
could be off by like 32X. If you could steer the amount of inlining, you could 
tune that to almost anything. { This bad benchmark will live on forever, as 
they all do, just as I was trying to understand literally 50 year old Fortran 
code yesterday. I think people underestimate how much `f2py` made Python take 
off for numerical work. }
  3. If you care enough about timing precision to do 10 warm-up runs, you 
probably will get better accuracy for the happy path/hot cache time via 
[bu/tim.nim](https://github.com/c-blake/bu/blob/main/tim.nim) than hyperfine. 
Specifically, the minimum time is the least contaminated by vulnerability to 
the huge network of queues and caches any modern CPU/timesharing OS has. For 
A/B time comparison one does want an error bar - so you want _some_ error 
estimate on the estimator of the minimum; `tim` just uses repeated runs for 
that. { Or else you probably don't want a summary number at all, but rather 
_the whole distribution function_ to display aforementioned vulnerability, or 
at least 20 or more quantiles; A good estimate of that requires far more data 
than the minimum and may also be very hard to make reproducible, given its 
sensitivity to the entire state of a test system. }

orc mm slower than markandsweep in my experience

Reply via email to