With the gcc backend doing "profile guided optimization" can often help 
(especially with measurements to drive inlining choices). e.g., doing just 
@Stefan_Salewski's command-line I get: 
    
    
    julia1: 84 ms (sum of pixels: 27677748)
    julia2: 83 ms (sum of pixels: 27677748)
    julia3: 82 ms (sum of pixels: 27677748)
    
    
    Run

while doing this 
    
    
    nim c -d:danger --panics:on -c t.nim
    gcc -O3 -flto -fprofile-generate -I/usr/lib/nim/lib ~/.cache/nim/r/t/*.c -o 
pg
    ./pg
    gcc -O3 -flto -fprofile-use -I/usr/lib/nim/lib ~/.cache/nim/r/t/*.c -o 
t-final
    
    
    Run

I get: 
    
    
    julia1: 82 ms (sum of pixels: 27677748)
    julia2: 82 ms (sum of pixels: 27677748)
    julia3: 82 ms (sum of pixels: 27677748)
    
    
    Run

So, the PGO "flattened" the performance a bit more. In this example the PGO 
speed boost was close to zero/measurement error, but I have seen as high as 2x 
speed-ups for more complicated programs. So, it's worth having some little 
"nim-pgo" wrapper script to automate the above if you are writing programs that 
have an easy "benchmark run".

Reply via email to