Re: Silicon Valley D Meetup - December 14, 2017 - "Experimenting with Link Time Optimization" by Jon Degenhardt

Johan Engelen via Digitalmars-d-announce Sat, 16 Dec 2017 03:56:05 -0800

On 11/21/2017 11:58 AM, Ali Çehreli wrote:
Meetup page:https://www.meetup.com/D-Lang-Silicon-Valley/events/245288287/
LDC[1], the LLVM-based D compiler, has been adding Link TimeOptimization capabilities over the last several releases. [...]
This talk will look at the results of applying LTO to one setof applications, eBay's TSV utilities[2]. [...]
Jon Degenhardt is a member of eBay's Search Science team.
[...] D quickly became his favorite programming language, onehe uses whenever he can.


On Friday, 15 December 2017 at 03:08:35 UTC, Ali Çehreli wrote:

This should be live now:

  http://youtu.be/e05QvoKy_8k


Great! I've added some comments there, pasted here:

Jon, thanks for the extensive talk and testing on LTO!
And thanks for recording / broadcasting :-)

(times are approximate)

7:45 Full vs Thin LTO further clarification: Full LTO is singlethreaded optimization and codegen (comparable with putting allsource in one module). Thin LTO loads each module separately andimports functions it needs from other modules, then after theoptimization and codegen happen in parallel for each module (andnormal linking happens afterwards). LTO's capabilities stem fromhaving access to functions' source code of other modules, andknowing which functions are internal to the program (so that theycan be removed, non-ABI-conformant calling convention, etc., alsodiscussed around 41:30); the importing+optim that happens at thestart of Thin LTO gives you that, with the added advantage ofparallel optim+codegen afterwards.

14:00 If the question was: do you need all libraries to be inIR: no. LTO works with mixed IR-object files and normal objectfiles and libraries. Even if linking with non-IR libraries, ithelps to know that no other object file references a symbol (soyou can internalize it and generate better code). But indeed, for_much_ better optimization potential: the more source you havecompiled with LTO enabled the better.

15:30 Whole source optimization at D-level has indeed higherpotential; at the moment I don't think we do many optimizationsthat are only possible at D-level (and so they are done at IRlevel; or not at all... I'm working e.g. on devirtualization).Extra remark: the first step towards that is much deeper andwell-defined spec of D semantics, in abstract machine terms.

15:45 Testing == contributing! And you're testing has greatlyimproved LDC's LTO, thanks!

15:50 The ldc-build-runtime tool was made by Martin Kinkelin,and as you mention it is the enabler for most of your work.

16:15 LDC LTO Windows == integrating LLD into LDC (or usinglld-link.exe), https://github.com/ldc-developers/ldc/issues/2028

~30:00 IIRC, the performance regression is due to cross-moduleinlining/optim (as you mention), which we get for free with LTO:-) (that is not to say that we wouldn't like to docross-module inlining without LTO)

33:20 Compilation time. LTO skips machine codegen during thenormal compilation, as machine codegen is done in the LTO linkingstep. So the slowdown with Thin LTO may not be too much (Thin LTObeing a parallel build). An extreme case where LTO may actuallyresult in faster codegen: if you have 1 million template functioninstantiations in CTFE, but they are not called during runtime,LTO may easily discard them before they reach the optimizationand machine codegen stage. In such a case, LTO may very well befaster (optimized machine codegen is time consuming); however,the IR does have to be created and written to disk, and then readfrom disk, that takes time too... Overall, Thin LTO is slowerthan a normal `-O3` build, but only by a small ratio, but it alsodoes more work (the added optimization). The compile speeddifference between Full LTO and Thin LTO is very large (Full LTOis several times slower).

39:40 Indeed, D doesn't require codegen of templates if we canprove that it is already codegenned in the library itself: i.e.you _have_ to _link_ with a template-only library. In C++,codegen of templates is mandatory (afaik), and thus you do nothave to link with a template-only library (e.g. headers filesonly). In D, this culling of template codegen is done to increasecompile speed; in that sense not a fair comparison with C++. Forcross-module inlining / inlining of templated functions: in C++all template code is available in each codegenned module, so LTOis not needed to improve things; in D, using LTO makes templatecode available that otherwise wouldn't ---> larger (potentiallymuch larger) relative gains with LTO for D. (this is somewhatparticular to LDC currently; GDC does better cross-moduleinlining; try LDC's `-enable-cross-module-inlining`)

56:40 Fully share your thinking that cross-module inlining is themain source of performance gains

Can't wait to see the results of LTO on Weka.io's (LARGE)applications. Work in progress...!

Could you add the reference links in the comment section theretoo? (can't click on blue links in the video ;-)


Clearly very interested in what your PGO testing will show. :-)

Cheers,
  Johan

Re: Silicon Valley D Meetup - December 14, 2017 - "Experimenting with Link Time Optimization" by Jon Degenhardt

Reply via email to