I've been watching the LLVM/LTO discussion with interest. I'm learning that I need to express myself carefully, because people read a lot into what I say, so I've been watching, and talking with lots of people, but not commenting. But, I've gotten a couple of emails asking me what my thoughts are, so here they are! None of what follows is an official position of the FSF, Steering Committee, or even CodeSourcery; it's just my personal thoughts.
First and foremost, I'm not an expert on either Tree-SSA or LLVM, so I'm only qualified to comment at a high level. From what I can see, and by all accounts, LLVM is a clean, well-engineered codebase with good capabilities. Assuming that all of the copyright details are worked out, which Chris is actively trying to do, I think we should consider the costs and benefits of replacing Tree-SSA with LLVM. I'm not sure exactly how the costs and benefits stack up, but we'll see. That shouldn't be read as either a favorable or unfavorable comment about switching; I certainly think we should consider LLVM, but I don't have an opinion as to what the outcome of that consideration ought to be. For me, the key consideration is the shape of the compiler-goodness graph vs. time, where goodness includes (in no particular order) optimization capability, cross-platform capability, correctness, backwards compatibility, support for link-time optimization, developer happiness, etc. Like some others have suggested, if it were up to me to pick (which it's not, since I don't control the developer base, steering committee, etc.), I'd make a big list of things we would have to do to LLVM and things we would have to do Tree-SSA, and then decide which one looked easier. The reason the shape of the graph matters to me, rather than just the value at some time t, is that I'm concerned about increasing GCC's overall market share, and market share is sticky, so, ideally, progress is continuous; periods of flatness, or downtrends, are harmful. However, one clearly doesn't want to win in the short term, only to lose big in the long term, so if the one of the LLVM or Tree-SSA lines is significantly higher in the forseeable future that's probably a bigger consideration than the shape of the graph in the short term. If we're opening the door to replacing Tree-SSA, are there any other technologies we should consider? In particular, brushing aside any copyright/patent issues, how would a Tree->WHIRL->RTL widget, using the Open64 technology, stack up relative to Tree-SSA and LLVM? Do any of the Open64 people have interest in integrating with GCC in this way? What are the legal issues and, if there are serious issues, does anyone want to try to resolve them? Again, this should not be read as advocating Open64; these aren't rhetorical questions; I just don't know the answers. There is one advantage I see in the LTO design over LLVM's design. In particular, the LTO proposal envisions a file format that is roughly at the level of GIMPLE. Such a file format could easily be extended to be at the source-level version of Tree used in the front-ends, so that object files could contain two extra sections: one for LTO and one for source-level information. The latter section could be used for things like C++ "export" -- but, more importantly, for other tools that need source-level information, like IDEs, indexers, checkers, etc. (All tools that presently use the EDG front end would be candidate clients for this interface.) There's a lot of interest in these kinds of tools, and I think their existence would be a competitive advantage for GCC because they would create compelling reasons to use GCC beyond just its capabilities as a compiler. So, at some point, I think we'll probably want (or even need) to add such an interface to GCC. LLVM's bytecode is a flat, three-address code style. That's convenient for optimization, and more compact that Tree, but source-level tools actually want tree data structures, complex expressions, and high-level control-flow primitives (so that they can even do things like distinguish a do-loop from a while-loop). So, it would be a drastic change to try to extend LLVM's bytecode format to present source-level information in this way. Nothing about LLVM is a step backwards from where we are today, with respect to this kind of tool integration. It's just that LLVM doesn't particularly advance us in that direction, whereas the infrastructure for the LTO proposal would facilitate this effort, in addition to just LTO. So, a possible advantage of the LTO proposal in this respect is that it might be a faster path to having both LTO and a source-level interface, and leave us with only one set of routines for reading/writing intermediate code to files. The obvious counter-point is that LLVM is almost certainly a faster path to link-time optimization, since it already works, and that it doesn't in any way prevent us from adding the source-level integration later. The fact that the LTO proposal "hopes" to perform link-time optimization, whereas LLVM always works, is not an intrinsic aspect of the LTO proposal. In particular, the reason the LTO proposal permits the optimizer to bail out was to provide more type-based aliasing information at link-time by making it possible to distinguish more types. By using a structural type equivalence (and therefore weakening slightly the assumptions that could be made about aliasing), the LTO proposal could be made to always work as well. So, I don't think that's an intrinsic design issue. I have no idea whether the different in aliasing resolution would make any measurable difference on real code. In addition to already working, LLVM clearly has significant advantages as well, including better memory usage. Nothing based on trees is likely to eliminate the memory gap. Clearly, LLVM's IR is better documented (and simpler) that Tree. If we do switch to LLVM, it's not going to happen before at least 4.3, and, if I had to guess, not before 4.4. We learned with Tree-SSA that replacing our optimizers takes a while to shake out, and I'd imagine the same would happen with LLVM; even assuming the existing LLVM code is itself perfect, the LLVM->RTL widget, new LLVM code to support GCC extensions that aren't currently supported, and bugs exposed in the GCC backends by the use of different code paths will all take a while to get right. We'd also have to look at any performance regressions to work out whether those issues represent real problems, and how we should deal with them. So, since it's going to be a while before we can integrate LLVM, if we do decide that's best, I hope that we'll continue to improve Tree-SSA in the meanwhile. There are some nice projects in the works, and I'd like to encourage people to keep working on them. Moving algorithms from Tree-SSA to LLVM will no doubt be tractable, and we'll be able to benefit from new Tree-SSA optimizations, if/until LLVM is integrated. -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304