LLVM IR influence on compiler debugging

bearophile Thu, 28 Jun 2012 23:07:56 -0700

This is a very easy to read article about the design of LLVM:
http://www.drdobbs.com/architecture-and-design/the-design-of-llvm/240001128


It explains what the IR is:

The most important aspect of its design is the LLVM IntermediateRepresentation (IR), which is the form it uses to represent codein the compiler. LLVM IR [...] is itself defined as a firstclass language with well-defined semantics.<

In particular, LLVM IR is both well specified and the onlyinterface to the optimizer. This property means that all youneed to know to write a front end for LLVM is what LLVM IR is,how it works, and the invariants it expects. Since LLVM IR has afirst-class textual form, it is both possible and reasonable tobuild a front end that outputs LLVM IR as text, then uses UNIXpipes to send it through the optimizer sequence and codegenerator of your choice. It might be surprising, but this isactually a pretty novel property to LLVM and one of the majorreasons for its success in a broad range of differentapplications. Even the widely successful and relativelywell-architected GCC compiler does not have this property: itsGIMPLE mid-level representation is not a self-containedrepresentation.<

That IR has a great effect on making it simpler to debug thecompiler, I think this is important (and I think it partiallyexplains why Clang was created so quickly):

Compilers are very complicated, and quality is important,therefore testing is critical. For example, after fixing a bugthat caused a crash in an optimizer, a regression test should beadded to make sure it doesn't happen again. The traditionalapproach to testing this is to write a .c file (for example)that is run through the compiler, and to have a test harnessthat verifies that the compiler doesn't crash. This is theapproach used by the GCC test suite, for example. The problemwith this approach is that the compiler consists of manydifferent subsystems and even many different passes in theoptimizer, all of which have the opportunity to change what theinput code looks like by the time it gets to the previouslybuggy code in question. If something changes in the front end oran earlier optimizer, a test case can easily fail to test whatit is supposed to be testing. By using the textual form of LLVMIR with the modular optimizer, the LLVM test suite has highlyfocused regression tests that can load LLVM IR from disk, run itthrough exactly one optimization pass, and verify the expectedbehavior. Beyond crashing, a more complicated behavioral testwants to verify that an optimization is actually performed.[...] While this might seem like a really trivial example, thisis very difficult to test by writing .c files: front ends oftendo constant folding as they parse, so it is very difficult andfragile to write code that makes its way downstream to aconstant folding optimization pass. Because we can load LLVM IRas text and send it through the specific optimization pass we'reinterested in, then dump out the result as another text file, itis really straightforward to test exactly what we want, both forregression and feature tests.<


Bye,
bearophile

LLVM IR influence on compiler debugging

Reply via email to