subject:"LTO, LLVM, etc."

Re: LTO, LLVM, etc.

2005-12-06 Thread Mark Mitchell

Mathieu Lacage wrote:

 A path where different solutions for different problems are evolved
 independently and then merged where it makes sense seems better to me
 than a path where a single solution to two different problems is
 attempted from the start. 
 
 Which is thus why I think that there are inherent reasons that you must
 necessarily have multiple representations.

There are a lot of places, in GCC and otherwise, where having a unified
framework for things has been a clear advantage.  So, I think your
statement that genericity is most often bad is too strong; it's bad
sometimes, and good other times.  You're definitely right that false
commonality can lead to bad results; but, on the other hand, a frequent
complaint is that people have to write the same code twice because
something that could have been shared was not.

That's why I think we should be talking about the effort required to
implement the approaches before us, and the payoffs from where those
approaches lead us, rather than generalities about design.  (And, if you
really want a prize, you can put risk-adjusted in front of effort
and payoffs above!)

Thanks,

-- 
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Re: LTO, LLVM, etc.

2005-12-05 Thread Ian Lance Taylor

Mark Mitchell [EMAIL PROTECTED] writes:

 There is one advantage I see in the LTO design over LLVM's design.  In
 particular, the LTO proposal envisions a file format that is roughly at
 the level of GIMPLE.  Such a file format could easily be extended to be
 at the source-level version of Tree used in the front-ends, so that
 object files could contain two extra sections: one for LTO and one for
 source-level information.  The latter section could be used for things
 like C++ export -- but, more importantly, for other tools that need
 source-level information, like IDEs, indexers, checkers, etc.  (All
 tools that presently use the EDG front end would be candidate clients
 for this interface.)

It seems to me that this is clearly useful anyhow.  And it seems to me
that whether or not we use LTO, LLVM, or neither, we will still want
something along these lines.

So if anybody is inclined to work on this, they could start now.
Anything that writes our our high level tree representation (GENERIC
plus language specific codes) is going to work straightforwardly for
our low level tree representation (GIMPLE).  And we are going to want
to be able to write out the high level representation no matter what.

In short, while this is an important issue, I don't see it as strongly
favoring either side.  What it means, essentially, is that LTO is not
quite as much work as it might otherwise seem to be, because we are
going to do some of the work anyhow.  So when considering how much
work has to be done for LTO compared to how much work has to be done
for LLVM, we should take that into account.

This is more or less what you said, of course, but I think with a
different spin.

 If we do switch to LLVM, it's not going to happen before at least 4.3,
 and, if I had to guess, not before 4.4.

Allow me to be the first person to say that if we switch to LLVM, the
first release which incorporates it as the default compilation path
should be called 5.0.

Ian

Re: LTO, LLVM, etc.

2005-12-05 Thread Steven Bosscher

On Saturday 03 December 2005 20:43, Mark Mitchell wrote:
 There is one advantage I see in the LTO design over LLVM's design.  In
 particular, the LTO proposal envisions a file format that is roughly at
 the level of GIMPLE.  Such a file format could easily be extended to be
 at the source-level version of Tree used in the front-ends, so that
 object files could contain two extra sections: one for LTO and one for
 source-level information.  The latter section could be used for things
 like C++ export -- but, more importantly, for other tools that need
 source-level information, like IDEs, indexers, checkers, etc.

I actually see this as a disadvantage.

IMVHO dumping for export and front-end tools and for the optimizers
should not be coupled like this.  Iff we decide to dump trees, then I
would hope the dumper would dump GIMPLE only, not the full front end
and middle-end tree representation.

Sharing a tree dumper between the front ends and the middle-end would
only make it more difficult again to move to sane data structures for
the middle end and to cleaner data structures for the front ends.

Gr.
Steven

Re: LTO, LLVM, etc.

2005-12-05 Thread Gabriel Dos Reis

Steven Bosscher [EMAIL PROTECTED] writes:

| On Saturday 03 December 2005 20:43, Mark Mitchell wrote:
|  There is one advantage I see in the LTO design over LLVM's design.  In
|  particular, the LTO proposal envisions a file format that is roughly at
|  the level of GIMPLE.  Such a file format could easily be extended to be
|  at the source-level version of Tree used in the front-ends, so that
|  object files could contain two extra sections: one for LTO and one for
|  source-level information.  The latter section could be used for things
|  like C++ export -- but, more importantly, for other tools that need
|  source-level information, like IDEs, indexers, checkers, etc.
| 
| I actually see this as a disadvantage.
| 
| IMVHO dumping for export and front-end tools and for the optimizers
| should not be coupled like this.

I'm wondering what the reasons are.

|  Iff we decide to dump trees, then I
| would hope the dumper would dump GIMPLE only, not the full front end
| and middle-end tree representation.
| 
| Sharing a tree dumper between the front ends and the middle-end would
| only make it more difficult again to move to sane data structures for
| the middle end and to cleaner data structures for the front ends.

Why?

-- Gaby

Re: LTO, LLVM, etc.

2005-12-05 Thread Chris Lattner


On Dec 5, 2005, at 11:48 AM, Steven Bosscher wrote:

On Saturday 03 December 2005 20:43, Mark Mitchell wrote:
There is one advantage I see in the LTO design over LLVM's  
design.  In
particular, the LTO proposal envisions a file format that is  
roughly at
the level of GIMPLE.  Such a file format could easily be extended  
to be

at the source-level version of Tree used in the front-ends, so that
object files could contain two extra sections: one for LTO and one  
for
source-level information.  The latter section could be used for  
things

like C++ export -- but, more importantly, for other tools that need
source-level information, like IDEs, indexers, checkers, etc.



I actually see this as a disadvantage.

IMVHO dumping for export and front-end tools and for the optimizers
should not be coupled like this.  Iff we decide to dump trees, then I
would hope the dumper would dump GIMPLE only, not the full front end
and middle-end tree representation.

Sharing a tree dumper between the front ends and the middle-end would
only make it more difficult again to move to sane data structures for
the middle end and to cleaner data structures for the front ends.


I totally agree with Steven on this one.  It is *good* for the  
representation hosting optimization to be different from the  
representation you use to represent a program at source level.  The  
two have very different goals and uses, and trying to merge them into  
one representation will give you a representation that isn't very  
good for either use.


In particular, the optimization representation really does want  
something in three-address form.  The current tree-ssa  
implementation emulates this (very inefficiently) using trees, but at  
a significant performance and memory cost.  The representation you  
want for source-level information almost certainly *must* be a tree.


I think it is very dangerous to try to artificially tie link-time  
(and other) optimization together with source-level clients.  The  
costs are great and difficult to recover from (e.g. as difficult as  
it is to move the current tree-ssa work to a lighter-weight  
representation) once the path has been started.


That said, having a good representation for source-level exporting is  
clearly useful.  To be perfectly clear, I am not against a source- 
level form, I am just saying that it should be *different* than the  
one used for optimization.


-Chris

Re: LTO, LLVM, etc.

2005-12-05 Thread Jim Blandy

On 12/5/05, Chris Lattner [EMAIL PROTECTED] wrote:
 That said, having a good representation for source-level exporting is
 clearly useful.  To be perfectly clear, I am not against a source-
 level form, I am just saying that it should be *different* than the
 one used for optimization.

Debug information describes two things: the source program, and its
relationship to the machine code produced by the toolchain.  The
second is much harder to produce; each pass needs to maintain the
relation between the code it produces and the compiler's original
input.  Keeping the two representations separate (which I could easily
see being beneficial for optimization) shifts that burden onto some
new party which isn't being discussed, and which will be quite
complicated.

Re: LTO, LLVM, etc.

2005-12-05 Thread Mark Mitchell

Steven Bosscher wrote:
 On Saturday 03 December 2005 20:43, Mark Mitchell wrote:
 
There is one advantage I see in the LTO design over LLVM's design.  In
particular, the LTO proposal envisions a file format that is roughly at
the level of GIMPLE.  Such a file format could easily be extended to be
at the source-level version of Tree used in the front-ends, so that
object files could contain two extra sections: one for LTO and one for
source-level information.  The latter section could be used for things
like C++ export -- but, more importantly, for other tools that need
source-level information, like IDEs, indexers, checkers, etc.
 
 
 I actually see this as a disadvantage.
 
 IMVHO dumping for export and front-end tools and for the optimizers
 should not be coupled like this.  Iff we decide to dump trees, then I
 would hope the dumper would dump GIMPLE only, not the full front end
 and middle-end tree representation.

You and I have disagreed about this before, and I think we will continue
to do so.

I don't see anything about Tree that I find inherently awful; in fact,
it looks very much like what I see in other front ends.  There are
aspects I dislike (overuse of pointers, lack of type-safety, unncessary
copies of types), but I couldn't possibly justify changing the C++
front-end, for example, to use something entirely other than Tree.  That
would be a big project, and I don't see much benefit; I think that the
things I don't like can be fixed incrementally.

(For example, it occurred to me a while back that by fixing the internal
type-correctness of expressions, which we want to do anyhow, we could
eliminate TREE_TYPE from expression nodes, which would save a pointer.)

It's not that I would object to waking up one day to find out that the
C++ front-end no longer used Tree, but it just doesn't seem very
compelling to me.

 Sharing a tree dumper between the front ends and the middle-end would
 only make it more difficult again to move to sane data structures for
 the middle end and to cleaner data structures for the front ends.

The differences between GIMPLE and C++ Trees are small, structurally;
there are just a lot of extra nodes in C++ that never reach GIMPLE.  If
we had a tree dumper for one, we'd get the other one almost for free.
So, I don't think sharing the tree dumper stands in the way of anything;
you can still switch either part of the compiler to use non-Tree
whenever you like.  You'll just need a new dumper, which you would have
wanted anyhow.

-- 
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Re: LTO, LLVM, etc.

2005-12-05 Thread Mark Mitchell

Chris Lattner wrote:

 I totally agree with Steven on this one.  It is *good* for the 
 representation hosting optimization to be different from the 
 representation you use to represent a program at source level.  The  two
 have very different goals and uses, and trying to merge them into  one
 representation will give you a representation that isn't very  good for
 either use.

I don't think that's entirely true.  One of the nice things about WHIRL,
at least in theory, is that the representation is gradually lowered
throughout the compiler, but is never abruptly transitioned, as with
GCC's Tree-RTL conversion.  So, it's easier to reuse code, instead of
having a Tree routine and an RTL routine that do the same thing, as we
do in several places in GCC.

As a concrete example, having a control-flow graph in the front-end is
very useful, for optimization purposes, diagnostic purposes, and for
plugging in domain-specific optimizers and analyzers.  It would be nice
to have flow-graph code that could be easily used in both places,
without having to make that code representation-independent, using
adapters to abstract away the actual representation.

That's not to say that I disagree with:

 In particular, the optimization representation really does want 
 something in three-address form.  The current tree-ssa  implementation
 emulates this (very inefficiently) using trees, but at  a significant
 performance and memory cost.  The representation you  want for
 source-level information almost certainly *must* be a tree.

Instead, it's a long-winded way of saying that I don't agree that
there's any inherent benefit to using completely different
representations, but that I do agree that one wants the right
representation for the job, and that Tree-SSA is not the best
representation for optimization.  So, if Tree-SSA is not replaced, it
will almost certainly need to evolve.

-- 
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Re: LTO, LLVM, etc.

2005-12-05 Thread Chris Lattner



On Dec 5, 2005, at 5:27 PM, Mark Mitchell wrote:

Steven Bosscher wrote:

IMVHO dumping for export and front-end tools and for the optimizers
should not be coupled like this.  Iff we decide to dump trees, then I
would hope the dumper would dump GIMPLE only, not the full front end
and middle-end tree representation.



It's not that I would object to waking up one day to find out that the
C++ front-end no longer used Tree, but it just doesn't seem very
compelling to me.


I agree with you.  The 'tree' data structure is conceptually what we  
want for the front-ends to represent the code.  They are quite  
similar in spirit to many AST representations.



Sharing a tree dumper between the front ends and the middle-end would
only make it more difficult again to move to sane data structures for
the middle end and to cleaner data structures for the front ends.



The differences between GIMPLE and C++ Trees are small, structurally;
there are just a lot of extra nodes in C++ that never reach  
GIMPLE.  If

we had a tree dumper for one, we'd get the other one almost for free.
So, I don't think sharing the tree dumper stands in the way of  
anything;

you can still switch either part of the compiler to use non-Tree
whenever you like.  You'll just need a new dumper, which you would  
have

wanted anyhow.


The point that I'm arguing (and I believe Steven agrees with) is that  
trees make a poor representation for optimization.  Their use in tree- 
ssa has lead to a representation that takes hundreds of bytes and  
half a dozen separate allocations for each gimple operation.  From  
the efficiency standpoint alone, it doesn't make sense to use trees  
for optimization.


Further, I would point out that it actually HURTS the front-ends to  
have the optimizers using trees.  We are getting very close to the  
time when there are not enough tree codes to go around, and there is  
still a great demand for new ones.  Many of these tree codes are  
front-end specific (e.g.  BIND_EXPR and various OpenMP nodes) and  
many of them are backend specific (e.g. the various nodes for the  
vectorizer).  Having the front-end and the back-end using the same  
enum *will* have a short term cost if the size of the tree enum field  
needs to be increased.


-Chris

Re: LTO, LLVM, etc.

2005-12-05 Thread Chris Lattner


On Dec 5, 2005, at 5:43 PM, Mark Mitchell wrote:

Chris Lattner wrote:

I totally agree with Steven on this one.  It is *good* for the
representation hosting optimization to be different from the
representation you use to represent a program at source level.   
The  two
have very different goals and uses, and trying to merge them into   
one
representation will give you a representation that isn't very   
good for

either use.



I don't think that's entirely true.  One of the nice things about  
WHIRL,

at least in theory, is that the representation is gradually lowered
throughout the compiler, but is never abruptly transitioned, as with
GCC's Tree-RTL conversion.  So, it's easier to reuse code, instead of
having a Tree routine and an RTL routine that do the same thing,  
as we

do in several places in GCC.


I understand where you are coming from here, and agree with it.   
There *is* value to being able to share things.


However, there is a cost.  I have never heard anything good about  
WHIRL from a compilation time standpoint: the continuous lowering  
approach does have its own cost.  Further, continuous lowering makes  
the optimizers more difficult to deal with, as they either need to  
know what 'form' they are dealing with, and/or can only work on a  
subset of the particular forms (meaning that they cannot be freely  
reordered).



In particular, the optimization representation really does want
something in three-address form.  The current tree-ssa   
implementation

emulates this (very inefficiently) using trees, but at  a significant
performance and memory cost.  The representation you  want for
source-level information almost certainly *must* be a tree.



Instead, it's a long-winded way of saying that I don't agree that
there's any inherent benefit to using completely different
representations, but that I do agree that one wants the right
representation for the job, and that Tree-SSA is not the best
representation for optimization.  So, if Tree-SSA is not replaced, it
will almost certainly need to evolve.


What sort of form do you think it could/would reasonably take? [1]   
Why hasn't it already happened?  Wouldn't it make more sense to do  
this work independently of the LTO work, as the LTO work *depends* on  
an efficient IR and tree-ssa would benefit from it anyway?


-Chris

1. I am just not seeing a better way, this is not a rhetorical question!

Re: LTO, LLVM, etc.

2005-12-05 Thread Mark Mitchell

Chris Lattner wrote:

[Up-front apology: If this thread continues, I may not be able to reply
for several days, as I'll be travelling.  I know it's not good form to
start a discussion and then skip out just when it gets interesting, and
I apologize in advance.  If I'd been thinking better, I would have
waited to send my initial mesasge until I returned.]

 I understand where you are coming from here, and agree with it.   There
 *is* value to being able to share things.
 
 However, there is a cost.  I have never heard anything good about  WHIRL
 from a compilation time standpoint: the continuous lowering  approach
 does have its own cost.

I haven't heard anything either way, but I take your comment to mean
that you have heard that WHIRL is slow, and I'm happy to believe that.
I'd agree that a data structure capable of representing more things
almost certainly imposes some cost over one capable of representing
fewer things!  So, yes, there's definitely a cost/benefit tradeoff here.

 What sort of form do you think it could/would reasonably take? [1]   Why
 hasn't it already happened?  Wouldn't it make more sense to do  this
 work independently of the LTO work, as the LTO work *depends* on  an
 efficient IR and tree-ssa would benefit from it anyway?

To be clear, I'm really not defending the LTO proposal.  I stand by my
statement that I don't know enough to have a preference!  So, please
don't read anything more into what's written here than just the plain words.

I did think a little bit about what it would take to make Tree-SSA more
efficient.  I'm not claiming that they're aren't serious or even fatal
flaws in those thoughts; this is just a brain dump.  I also don't claim
to have measurements showing how much of a difference these changes
would make.

I'm going to leave TYPE nodes out -- because they're shared with the
front-ends, and so will live on anyhow.  Similarly for the DECL nodes
that correspond to global variables and global functions.  So, that
leaves EXPR nodes and (perhaps most importantly!) DECLs for
local/temporary variables.

The first thing to do would be to simplify the local variable DECLs; all
we should really need from such a thing is its type (including its
alignment, which, despite historical GCC practice is part of its type),
its name (for debugging), its location relative to the stack frame (if
we want to be able to do optimizations based on the location on the
stack, which we may or may not want to do at this point), and whatever
mark bits or scratch space are needed by optimization passes.  The type
and name are shared across all SSA instances of the same variable --
so we could use a pointer to a canonical copy of that information.  (For
user-visible variables, the canonical copy could be the VAR_DECL from
the front end.)  So, local variables would collapse 176 bytes (on my
system) to something more like 32 bytes.

The second thing would be to modify expression nodes.  As I mentioned,
I'd eliminate their TYPE fields.  I'd also eliminate their
TREE_COMPLEXITY fields, which are already nearly unused.  There's no
reason TREE_BLOCK should be needed in most expressions; it's only needed
on nodes that correspond to lexical blocks.  Those changes would
eliminate a significant amount of the current size (64 bytes) for
expressions.  I also think it ought to be possible to eliminate the
source_locus field; instead of putting it on every expression, insert
line-notes into the statement stream, at least by the time we reach the
optimizers.  I'd also eliminate uses of TREE_LIST to link together the
nodes in CALL_EXPRs; instead use a vector of operands hanging off the
end of the CALL_EXPR corresponding to the number of arguments in the
call.  Similarly, I'd consider using a vector to store statements in a
block, or rather than a linked list.

Finally, if you wanted, you could flatten expressions so that each
expression was, ala LLVM, an instruction, and all operands were
leaves rather than themselves trees; that's a subset of the current
tree format.  I'm not sure that step would in-and-of-itself save memory,
but it would be more optimizer-friendly.

In my opinion, the reason this work hasn't been done is that (a) it's
not trivial, and (b) there was no sufficiently pressing need.  GCC uses
a lot of memory, and that's been an issue, but it hasn't been a killer
issue in the sense that huge numbers of people who would otherwise have
used GCC went somewhere else.  Outside of work done by Apple and
CodeSourcery, attacking that probably hasn't been (as far as I know?)
funded by any companies.

You're correct that LTO, were it to proceed, might make this a killer
issue, and then we'd have to attack it -- and so that work should go on
the cost list for LTO.  You're also correct that some of this work would
also benefit GCC as a whole, in that the front-ends would use less
memory too, and so you're also correct that there is value in doing at
least some of the work independently of LTO -- although

Re: LTO, LLVM, etc.

2005-12-05 Thread Mark Mitchell

Steven Bosscher wrote:

 What makes EDG so great is that it represents C++ far closer to the
 actual source code than G++ does.

I know the EDG front-end very well; I first worked with it in 1994, and
I have great respect for both the EDG code and the EDG people.

I disagree with your use of far closer above; I'd say a bit closer.

Good examples of differences are that (before lowering) it has a
separate operator for virtual function call (rather than using a
virtual function table explicitly) and that pointers-to-member functions
are opaque objects, not structures.  These are significant differences,
but they're not huge differences, or particularly hard to fix in G++.

The key strengths of the EDG front-end are its correctness (second to
none), cleanliness, excellent documentation, and excellent support.  It
does what it's supposed to do very well.

 It would be good for G++ to have
 a representation that is closer to the source code than what it has
 now.

Yes, closing the gap would be good!  I'm a big proponent of introducing
a lowering phase into G++.  So, while I might disagree about the size of
gap, I agree that we should eliminate it. :-)

 I'd be surprised if there a compiler exists that runs optimizations
 on EDG's C++ specific representation. I think all compilers that use
 EDG translate EDG's representation to a more low-level representation.

I've worked on several compilers that used the EDG front-end.  In all
cases, there was eventually translation to different representations,
and I agree that you wouldn't want to do all your optimization on EDG
IL.  However, one compiler I worked on did do a fair amount of
optimization on EDG IL, and the KAI inliner also did a lot of
optimization (much more than just inlining) on EDG IL.

Several of the formats to which I've seen EDG IL translated (WHIRL and a
MetaWare internal format, for example) are at about the level of
lowered EDG IL (which is basically C with exceptions), which is the
form of EDG IL that people use when translating into their internal
representation.  In some cases, these formats are then again transformed
into a lower-level, more RTL-ish format at some point during optimization.

I'm not saying that having two different formats is necessarily a bad
thing (we've already got Tree and RTL, so we're really talking about two
levels or three), or that switching to LLVM is a bad idea, but I don't
think there's any inherent reason that we must necessarily have multiple
representations.

My basic point is that I want to see the decision be made on the basis
of the effort required to achieve our goals, not on our opinions about
what we think might be the best design in the abstract.  In other words,
I don't think that the fact that GCC currently uses the same data
structures for front-ends and optimizers is in and of itself a problem
-- but I'm happy to switch to LLVM, if we think that it's easier to make
LLVM do what we want than it is to make Tree-SSA do what we want.

-- 
Mark Mitchell
CodeSourcery, LLC
[EMAIL PROTECTED]
(916) 791-8304

Re: LTO, LLVM, etc.

2005-12-05 Thread Mathieu Lacage

hi mark,

On Mon, 2005-12-05 at 21:33 -0800, Mark Mitchell wrote:

 I'm not saying that having two different formats is necessarily a bad
 thing (we've already got Tree and RTL, so we're really talking about two
 levels or three), or that switching to LLVM is a bad idea, but I don't
 think there's any inherent reason that we must necessarily have multiple
 representations.

In what I admit is a relatively limited experience (compared to that of
you or other gcc contributors) of working with a few large old sucky
codebases, I think I have learned one thing: genericity is most often
bad. Specifically, I think that trying to re-use the same data
structure/algorithms/code for widely different scenarios is what most
often leads to large overall complexity and fragility.

It seems to me that the advantages of using the LTO representation for
frontend-dumping and optimization (code reuse, etc.) are not worth the
cost (a single piece of code used for two very different use-cases will
necessarily be more complex and thus prone to design bugs). Hubris will
lead developers to ignore the latter because they believe they can avoid
the complexity trap of code reuse. It might work in the short term
because you and others might be able to achieve this feat but I fail to
see how you will be able to avoid the inevitable decay of code inherent
to this solution in the long run.

A path where different solutions for different problems are evolved
independently and then merged where it makes sense seems better to me
than a path where a single solution to two different problems is
attempted from the start. 

Which is thus why I think that there are inherent reasons that you must
necessarily have multiple representations.

regards,
Mathieu

PS: I know I am oversimplifying the problem and your position and I
apologize for this.
--

LTO, LLVM, etc.

2005-12-03 Thread Mark Mitchell

I've been watching the LLVM/LTO discussion with interest.

I'm learning that I need to express myself carefully, because people
read a lot into what I say, so I've been watching, and talking with lots
of people, but not commenting.  But, I've gotten a couple of emails
asking me what my thoughts are, so here they are!  None of what follows
is an official position of the FSF, Steering Committee, or even
CodeSourcery; it's just my personal thoughts.

First and foremost, I'm not an expert on either Tree-SSA or LLVM, so I'm
only qualified to comment at a high level.  From what I can see, and by
all accounts, LLVM is a clean, well-engineered codebase with good
capabilities.  Assuming that all of the copyright details are worked
out, which Chris is actively trying to do, I think we should consider
the costs and benefits of replacing Tree-SSA with LLVM.  I'm not sure
exactly how the costs and benefits stack up, but we'll see.

That shouldn't be read as either a favorable or unfavorable comment
about switching; I certainly think we should consider LLVM, but I don't
have an opinion as to what the outcome of that consideration ought to be.

For me, the key consideration is the shape of the compiler-goodness
graph vs. time, where goodness includes (in no particular order)
optimization capability, cross-platform capability, correctness,
backwards compatibility, support for link-time optimization, developer
happiness, etc.  Like some others have suggested, if it were up to me to
pick (which it's not, since I don't control the developer base, steering
committee, etc.), I'd make a big list of things we would have to do to
LLVM and things we would have to do Tree-SSA, and then decide which one
looked easier.

The reason the shape of the graph matters to me, rather than just the
value at some time t, is that I'm concerned about increasing GCC's
overall market share, and market share is sticky, so, ideally, progress
is continuous; periods of flatness, or downtrends, are harmful.
However, one clearly doesn't want to win in the short term, only to lose
big in the long term, so if the one of the LLVM or Tree-SSA lines is
significantly higher in the forseeable future that's probably a bigger
consideration than the shape of the graph in the short term.

If we're opening the door to replacing Tree-SSA, are there any other
technologies we should consider?  In particular, brushing aside any
copyright/patent issues, how would a Tree-WHIRL-RTL widget, using the
Open64 technology, stack up relative to Tree-SSA and LLVM?  Do any of
the Open64 people have interest in integrating with GCC in this way?
What are the legal issues and, if there are serious issues, does anyone
want to try to resolve them?  Again, this should not be read as
advocating Open64; these aren't rhetorical questions; I just don't know
the answers.

There is one advantage I see in the LTO design over LLVM's design.  In
particular, the LTO proposal envisions a file format that is roughly at
the level of GIMPLE.  Such a file format could easily be extended to be
at the source-level version of Tree used in the front-ends, so that
object files could contain two extra sections: one for LTO and one for
source-level information.  The latter section could be used for things
like C++ export -- but, more importantly, for other tools that need
source-level information, like IDEs, indexers, checkers, etc.  (All
tools that presently use the EDG front end would be candidate clients
for this interface.)

There's a lot of interest in these kinds of tools, and I think their
existence would be a competitive advantage for GCC because they would
create compelling reasons to use GCC beyond just its capabilities as a
compiler.  So, at some point, I think we'll probably want (or even need)
to add such an interface to GCC.

LLVM's bytecode is a flat, three-address code style.  That's convenient
for optimization, and more compact that Tree, but source-level tools
actually want tree data structures, complex expressions, and high-level
control-flow primitives (so that they can even do things like
distinguish a do-loop from a while-loop).  So, it would be a drastic
change to try to extend LLVM's bytecode format to present source-level
information in this way.

Nothing about LLVM is a step backwards from where we are today, with
respect to this kind of tool integration.  It's just that LLVM doesn't
particularly advance us in that direction, whereas the infrastructure
for the LTO proposal would facilitate this effort, in addition to just
LTO.  So, a possible advantage of the LTO proposal in this respect is
that it might be a faster path to having both LTO and a source-level
interface, and leave us with only one set of routines for
reading/writing intermediate code to files.  The obvious counter-point
is that LLVM is almost certainly a faster path to link-time
optimization, since it already works, and that it doesn't in any way
prevent us from adding the source-level integration later.

The fact

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

Re: LTO, LLVM, etc.

LTO, LLVM, etc.

14 matches

Site Navigation

Mail list logo

Footer information