Hi all,
I have been working on adding profile-guided optimization (PGO) to LDC [1][2][3]. At this point, I'd like to hear your input and hope you can help with testing!

Unfortunately, to try it out, you will need to build LDC with LLVM3.7 yourself. PGO should work on OS X, Linux, and Windows.

A first implementation is mostly complete now: it can generate an executable that will output profile data, and it can use profile data during a second compilation pass (and it will tell LLVM about branch frequencies). LDC does not do any PGO optimizations (yet): LLVM should do that.

It works like PGO with Clang, with the fprofile-instr-generate and fprofile-instr-use cmdline options [4]:
ldc2 -fprofile-instr-generate=test.profraw -run test.d
llvm-profdata merge test.profraw -output test.profdata
ldc2 -profile-instr-use=test.profdata test.d -of=test
You should now have the executable "test" with an amazing performance boost ;-)

You can inspect the generated code using LDC's -output-ll switch. Functions should be annotated with call frequencies, and most branches should be annotated with branch_weights metadata. For example:
define void @for_loop() #0 !prof !12
...
!12 = !{!"function_entry_count", i64 234}
for "void for_loop()" that is called 234 times, and
br i1 %3, label %if, label %else, !prof !17
...
!17 = !{!"branch_weights", i32 5, i32 3}
for "if (condition) {...} else {...}"
The branch_weights have an offset of 1, so the above means that the condition was true 4 times, and false 2 times. If a certain piece of code is never executed, no metadata is added (i.e. you won't see {!"branch_weights", i32 1, i32 1}). Some branches are intentionally not instrumented/annotated if they lead to terminating code (e.g. array boundschecks and auto-generated nullptr checks on this at class method entry).

I hope you will be able to test and comment on the work. I am very interested in hearing about performance gains(/losses/no-change) for your programs. I am curious to learn for what kinds of code it makes a difference in practice.

Thanks!
  Johan

(future work will probably include coverage analysis (llvm-cov) and support for sampling-based profiles, which should fit naturally with the current implementation)

[1] http://wiki.dlang.org/LDC_LLVM_profiling_instrumentation
[2] https://github.com/JohanEngelen/ldc/tree/pgo (warning: I will rebase soon)
[3] https://github.com/ldc-developers/ldc/pull/1219
[4] http://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation

Reply via email to