Thank you for your advices and running benchmark code on many CPUs with different compiler options and shared results.
> Possibly harmful to result interpretation esp. cross-CPUs with things like > RPi/ARM: The minimum over 101 runs in your bench template is good to reduce > noise from CPU spin up to higher clock rates, BUT the two sleep() s are bad > since you are probably giving the CPU/OS time to put the CPU back into a > lower power mode. Essentially, you are doing one thing to make results less > sensitive to other work happening on the system and another to make it more > sensitive. Unless you have a very specific workload in mind, it's usually > best to pick a direction. When I ran `testgcd.nim` 3 times on Raspberry Pi 3 without sleep inside the loop in `bench` template like following code, I got unstable timings. template bench(repeat: int; init, body: untyped): untyped = var minTime = initDuration(days = 2) for i in 1 .. repeat: init let start = getMonoTime() body let finish = getMonoTime() minTime = min(minTime, finish - start) # Prevent thermal throttling. #sleep(20) echo minTime.inMicroseconds, " micro second" sleep(2000) Run In first and second results, the timing of `gcdLAR` is 9410 micro sec but in third result, it go down to 4050 micro sec. In first result, the timing of `gcdSub` is 9178 micro sec but in second and third results, they are about 3900 micro sec. $ nim c -r -d:release testgcd.nim gcd in stdlib: 7324 micro second gcdLAR: 9410 micro second gcdLAR2: 4055 micro second gcdLAR3: 5013 micro second gcdLAR4: 8022 micro second gcdSub: 9178 micro second gcdSub2: 7456 micro second 1638163816381638163816381638 [alarm@rasp proj]$ ./testgcd gcd in stdlib: 7324 micro second gcdLAR: 9410 micro second gcdLAR2: 4055 micro second gcdLAR3: 5013 micro second gcdLAR4: 8023 micro second gcdSub: 3892 micro second gcdSub2: 3163 micro second 1638163816381638163816381638 [alarm@rasp proj]$ ./testgcd gcd in stdlib: 7324 micro second gcdLAR: 3996 micro second gcdLAR2: 4050 micro second gcdLAR3: 5012 micro second gcdLAR4: 8024 micro second gcdSub: 3886 micro second gcdSub2: 3163 micro second 1638163816381638163816381638 Run When I run testgcd 3 times with `sleep(20)` inside the loop in `bench` template, I got almost same results. $ nim c -r -d:release testgcd.nim gcd in stdlib: 3115 micro second gcdLAR: 3992 micro second gcdLAR2: 4053 micro second gcdLAR3: 5020 micro second gcdLAR4: 3423 micro second gcdSub: 3894 micro second gcdSub2: 3171 micro second 1638163816381638163816381638 [alarm@rasp proj]$ ./testgcd gcd in stdlib: 3115 micro second gcdLAR: 3995 micro second gcdLAR2: 4056 micro second gcdLAR3: 5019 micro second gcdLAR4: 3425 micro second gcdSub: 3910 micro second gcdSub2: 3176 micro second 1638163816381638163816381638 [alarm@rasp proj]$ ./testgcd gcd in stdlib: 3206 micro second gcdLAR: 4111 micro second gcdLAR2: 4181 micro second gcdLAR3: 5020 micro second gcdLAR4: 3417 micro second gcdSub: 3911 micro second gcdSub2: 3174 micro second 1638163816381638163816381638 Run Maybe it is not caused by thermal throttling. CPU temperate didn't raised much after running `testgcd`. There are many `Undervoltage detected!` warning in dmesg and it might be causing unstable results. My Raspberry Pi 3 has no heat sink and good power supply. It might inadequate for benchmark.