Thank you for your advices and running benchmark code on many CPUs with 
different compiler options and shared results.

> Possibly harmful to result interpretation esp. cross-CPUs with things like 
> RPi/ARM: The minimum over 101 runs in your bench template is good to reduce 
> noise from CPU spin up to higher clock rates, BUT the two sleep() s are bad 
> since you are probably giving the CPU/OS time to put the CPU back into a 
> lower power mode. Essentially, you are doing one thing to make results less 
> sensitive to other work happening on the system and another to make it more 
> sensitive. Unless you have a very specific workload in mind, it's usually 
> best to pick a direction.

When I ran `testgcd.nim` 3 times on Raspberry Pi 3 without sleep inside the 
loop in `bench` template like following code, I got unstable timings.
    
    
    template bench(repeat: int; init, body: untyped): untyped =
        var minTime = initDuration(days = 2)
        for i in 1 .. repeat:
          init
          let start = getMonoTime()
          body
          let finish = getMonoTime()
          minTime = min(minTime, finish - start)
          # Prevent thermal throttling.
          #sleep(20)
        echo minTime.inMicroseconds, " micro second"
        sleep(2000)
    
    
    Run

In first and second results, the timing of `gcdLAR` is 9410 micro sec but in 
third result, it go down to 4050 micro sec. In first result, the timing of 
`gcdSub` is 9178 micro sec but in second and third results, they are about 3900 
micro sec.
    
    
    $ nim c -r -d:release testgcd.nim
    gcd in stdlib: 7324 micro second
    gcdLAR:        9410 micro second
    gcdLAR2:       4055 micro second
    gcdLAR3:       5013 micro second
    gcdLAR4:       8022 micro second
    gcdSub:        9178 micro second
    gcdSub2:       7456 micro second
    1638163816381638163816381638
    [alarm@rasp proj]$ ./testgcd
    gcd in stdlib: 7324 micro second
    gcdLAR:        9410 micro second
    gcdLAR2:       4055 micro second
    gcdLAR3:       5013 micro second
    gcdLAR4:       8023 micro second
    gcdSub:        3892 micro second
    gcdSub2:       3163 micro second
    1638163816381638163816381638
    [alarm@rasp proj]$ ./testgcd
    gcd in stdlib: 7324 micro second
    gcdLAR:        3996 micro second
    gcdLAR2:       4050 micro second
    gcdLAR3:       5012 micro second
    gcdLAR4:       8024 micro second
    gcdSub:        3886 micro second
    gcdSub2:       3163 micro second
    1638163816381638163816381638
    
    
    Run

When I run testgcd 3 times with `sleep(20)` inside the loop in `bench` 
template, I got almost same results.
    
    
    $ nim c -r -d:release testgcd.nim
    gcd in stdlib: 3115 micro second
    gcdLAR:        3992 micro second
    gcdLAR2:       4053 micro second
    gcdLAR3:       5020 micro second
    gcdLAR4:       3423 micro second
    gcdSub:        3894 micro second
    gcdSub2:       3171 micro second
    1638163816381638163816381638
    [alarm@rasp proj]$ ./testgcd
    gcd in stdlib: 3115 micro second
    gcdLAR:        3995 micro second
    gcdLAR2:       4056 micro second
    gcdLAR3:       5019 micro second
    gcdLAR4:       3425 micro second
    gcdSub:        3910 micro second
    gcdSub2:       3176 micro second
    1638163816381638163816381638
    [alarm@rasp proj]$ ./testgcd
    gcd in stdlib: 3206 micro second
    gcdLAR:        4111 micro second
    gcdLAR2:       4181 micro second
    gcdLAR3:       5020 micro second
    gcdLAR4:       3417 micro second
    gcdSub:        3911 micro second
    gcdSub2:       3174 micro second
    1638163816381638163816381638
    
    
    Run

Maybe it is not caused by thermal throttling. CPU temperate didn't raised much 
after running `testgcd`. There are many `Undervoltage detected!` warning in 
dmesg and it might be causing unstable results. My Raspberry Pi 3 has no heat 
sink and good power supply. It might inadequate for benchmark.

Reply via email to