Just wanted to share some info about using LTO + PGO with Clang for Nim.
First of all you should know that PGO optimization is not always good because
it optimizes code paths for the profile-guided run, so some corner cases may
have even less performance.
The process:
* Compile your application like
`nim c -d:danger --cc:clang --passC:"-flto -fprofile-instr-generate"
--passL:"-flto -fprofile-instr-generate" file.nim`
* Run it with your typical workloads to generate the profiling data for PGO -
`./file`.
After that you should have a file named `default.profraw` in the folder where
you ran your program.
* Use
`llvm-profdata merge default.profraw -output data.profdata` to process the
profiling data for Clang to use
* Compile your program again, this time like so (you should be in the same
folder with the `data.profdata file`)
`nim c -d:danger --cc:clang --passC:"-flto -fprofile-instr-use=data.profdata"
--passL:"-flto -fprofile-instr-use=data.profdata" file.nim`
After that the process is done, you can now test your binary to see if you got
any performance boost :)
I tried doing that for my `mathexpr` library:
# Don't mind the nimbench, I know I shouldn't use it :P
import mathexpr, nimbench
let e = newEvaluator()
e.addVars({"a": 3.0, "b": 5.7})
bench("test", m):
for x in 1..m:
var c = e.eval("(a^a + b * 2 - 3*4.2412+5335^2-4e3)^2")
if c == 0:
echo "can't"
runBenchmarks()
Run
No LTO/PGO (also yeah, I'm using gc:arc since it's faster :P) - `nim c
-d:danger --gc:arc --cc:clang -r tests/bench.nim`:
============================================================================
bench.nim relative time/iter iters/s
============================================================================
"test" 435.24ns 2.30M
Run
LTO only - `nim c -d:danger --gc:arc --cc:clang --passC:"-flto" --passL:"-flto"
-r tests/bench.nim`:
"test" 332.45ns 3.01M
Run
LTO+PGO (I won't show all commands, just the last one) - `nim c -r -d:danger
--gc:arc --cc:clang --passC:"-flto -fprofile-instr-use=perf.profdata"
--passL:"-flto -fprofile-instr-use=perf.profdata" tests/bench.nim`:
"test" 266.02ns 3.76M
Run
Thanks for reading :)