On Mon, Mar 16, 2009 at 12:11 PM, Adam Nemet <ane...@caviumnetworks.com> wrote:
> Ramana Radhakrishnan writes:
>> [Resent because of account funnies. Apologies to those who get this twice]
>>
>> Hi,
>>
>> > > This problem is reported every once in a while, all targets with
>> > small
>> > > load-immediate instructions suffer from this, especially since GCC
>> > 4.0
>> > > (i.e. since tree-ssa).  But it seems there is just not enough
>> > interest
>> > > in having it fixed somehow, or someone would have taken care of it by
>> > > now.
>> > >
>> > > I've summed up before how the problem _could_ be fixed, but I can't
>> > > find where.  So here we go again.
>> > >
>> > > This could be solved in CSE by extending the notion of "related
>> > > expressions" to constants that can be generated from other constants
>> > > by a shift. Alternatively, you could create a simple, separate pass
>> > > that applies CSE's "related expressions" thing in dominator tree
>> > walk.
>> >
>> > See http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00158.html for
>> > handling
>> > something similar when related expressions differ by a small additive
>> > constant.  I am planning to finish this and submit it for 4.5.
>>
>> Wouldn't doing this in CSE only solve the problem within an extended basic
>> block and not necessarily across the program ? Surely you'd want to do it
>> globally or am I missing something very basic here ?
>
> No, you're not.  There are plans moving some of what's in CSE to a new LCM
> (global) pass.  Also note that for a global a pass you clearly need some more
> sophisticated cost model for deciding when CSEing is beneficial.  On a
> multi-scalar architecture, instructions synthesizing consts sometimes appear
> to be "free" whereas holding a value a in a register for an extended period of
> time is not.
>

Right. You probably want something closer to nigel horspool's
"isothermal speculative PRE" which takes into account (using
heuristics and profiles) where the best place to put things is based
on costs, instead of LCM, which uses a notion of "lifetime optimality"

See http://webhome.cs.uvic.ca/~nigelh/pubs.html for "Fast
Profile-Based Partial Redundancy Elimination"

There was a working implementation of this done for GCC 4.1 that used
profile info and execution counts.
If you are interested, and can hunt down David Pereira (He isn't at
uvic anymore, and i haven't talked to him since so i don't have his
email), he'd probably give you the code :)

Reply via email to