I've been watching the LLVM/LTO discussion with interest.

I'm learning that I need to express myself carefully, because people
read a lot into what I say, so I've been watching, and talking with lots
of people, but not commenting.  But, I've gotten a couple of emails
asking me what my thoughts are, so here they are!  None of what follows
is an official position of the FSF, Steering Committee, or even
CodeSourcery; it's just my personal thoughts.

First and foremost, I'm not an expert on either Tree-SSA or LLVM, so I'm
only qualified to comment at a high level.  From what I can see, and by
all accounts, LLVM is a clean, well-engineered codebase with good
capabilities.  Assuming that all of the copyright details are worked
out, which Chris is actively trying to do, I think we should consider
the costs and benefits of replacing Tree-SSA with LLVM.  I'm not sure
exactly how the costs and benefits stack up, but we'll see.

That shouldn't be read as either a favorable or unfavorable comment
about switching; I certainly think we should consider LLVM, but I don't
have an opinion as to what the outcome of that consideration ought to be.

For me, the key consideration is the shape of the compiler-goodness
graph vs. time, where goodness includes (in no particular order)
optimization capability, cross-platform capability, correctness,
backwards compatibility, support for link-time optimization, developer
happiness, etc.  Like some others have suggested, if it were up to me to
pick (which it's not, since I don't control the developer base, steering
committee, etc.), I'd make a big list of things we would have to do to
LLVM and things we would have to do Tree-SSA, and then decide which one
looked easier.

The reason the shape of the graph matters to me, rather than just the
value at some time t, is that I'm concerned about increasing GCC's
overall market share, and market share is sticky, so, ideally, progress
is continuous; periods of flatness, or downtrends, are harmful.
However, one clearly doesn't want to win in the short term, only to lose
big in the long term, so if the one of the LLVM or Tree-SSA lines is
significantly higher in the forseeable future that's probably a bigger
consideration than the shape of the graph in the short term.

If we're opening the door to replacing Tree-SSA, are there any other
technologies we should consider?  In particular, brushing aside any
copyright/patent issues, how would a Tree->WHIRL->RTL widget, using the
Open64 technology, stack up relative to Tree-SSA and LLVM?  Do any of
the Open64 people have interest in integrating with GCC in this way?
What are the legal issues and, if there are serious issues, does anyone
want to try to resolve them?  Again, this should not be read as
advocating Open64; these aren't rhetorical questions; I just don't know
the answers.

There is one advantage I see in the LTO design over LLVM's design.  In
particular, the LTO proposal envisions a file format that is roughly at
the level of GIMPLE.  Such a file format could easily be extended to be
at the source-level version of Tree used in the front-ends, so that
object files could contain two extra sections: one for LTO and one for
source-level information.  The latter section could be used for things
like C++ "export" -- but, more importantly, for other tools that need
source-level information, like IDEs, indexers, checkers, etc.  (All
tools that presently use the EDG front end would be candidate clients
for this interface.)

There's a lot of interest in these kinds of tools, and I think their
existence would be a competitive advantage for GCC because they would
create compelling reasons to use GCC beyond just its capabilities as a
compiler.  So, at some point, I think we'll probably want (or even need)
to add such an interface to GCC.

LLVM's bytecode is a flat, three-address code style.  That's convenient
for optimization, and more compact that Tree, but source-level tools
actually want tree data structures, complex expressions, and high-level
control-flow primitives (so that they can even do things like
distinguish a do-loop from a while-loop).  So, it would be a drastic
change to try to extend LLVM's bytecode format to present source-level
information in this way.

Nothing about LLVM is a step backwards from where we are today, with
respect to this kind of tool integration.  It's just that LLVM doesn't
particularly advance us in that direction, whereas the infrastructure
for the LTO proposal would facilitate this effort, in addition to just
LTO.  So, a possible advantage of the LTO proposal in this respect is
that it might be a faster path to having both LTO and a source-level
interface, and leave us with only one set of routines for
reading/writing intermediate code to files.  The obvious counter-point
is that LLVM is almost certainly a faster path to link-time
optimization, since it already works, and that it doesn't in any way
prevent us from adding the source-level integration later.

The fact that the LTO proposal "hopes" to perform link-time
optimization, whereas LLVM always works, is not an intrinsic aspect of
the LTO proposal.  In particular, the reason the LTO proposal permits
the optimizer to bail out was to provide more type-based aliasing
information at link-time by making it possible to distinguish more
types.  By using a structural type equivalence (and therefore weakening
slightly the assumptions that could be made about aliasing), the LTO
proposal could be made to always work as well.  So, I don't think that's
an intrinsic design issue.  I have no idea whether the different in
aliasing resolution would make any measurable difference on real code.

In addition to already working, LLVM clearly has significant advantages
as well, including better memory usage.  Nothing based on trees is
likely to eliminate the memory gap.  Clearly, LLVM's IR is better
documented (and simpler) that Tree.

If we do switch to LLVM, it's not going to happen before at least 4.3,
and, if I had to guess, not before 4.4.  We learned with Tree-SSA that
replacing our optimizers takes a while to shake out, and I'd imagine the
same would happen with LLVM; even assuming the existing LLVM code is
itself perfect, the LLVM->RTL widget, new LLVM code to support GCC
extensions that aren't currently supported, and bugs exposed in the GCC
backends by the use of different code paths will all take a while to get
right.  We'd also have to look at any performance regressions to work
out whether those issues represent real problems, and how we should deal
with them.

So, since it's going to be a while before we can integrate LLVM, if we
do decide that's best, I hope that we'll continue to improve Tree-SSA in
the meanwhile.  There are some nice projects in the works, and I'd
like to encourage people to keep working on them.  Moving algorithms
from Tree-SSA to LLVM will no doubt be tractable, and we'll be able to
benefit from new Tree-SSA optimizations, if/until LLVM is integrated.

-- 
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Reply via email to