On Wed, 2012-03-21 at 13:57 +0000, Richard Earnshaw wrote:
> On 21/03/12 13:40, William J. Schmidt wrote:
> > 
> > And it becomes even more difficult with more complex scenarios.
> > Consider:
> > 
> > a = x + (3 * s);
> > b = x + (5 * s);
> > c = x + (7 * s);
> > 
> > The framework I've developed recognizes that this group of instructions
> > is related, and that it is profitable to replace them as follows:
> > 
> > a = x + (3 * s);
> > t = 2 * s;
> > b = a + t;
> > c = b + t;
> > 
> 
> Given that CPUs often have shift+add, that's not necessarily best
> either.  Also, on pipelined super-scalar systems you're serializing a
> problem when it might be better to improve the parallelism.
> 
> The best sequence on ARM would probably be something like
> 
> a = x + (3 * s);
> b = a + (2 * s); (ADD b, a, s, LSL #1)
> c = a + (4 * s); (ADD b, a, s, LSL #2).
> 

These are good points, and I hope you'll keep an eye on this work as it
proceeds.  I should have been less categorical about stating the
profitability of the transformation.  My intent is that the cost model
will reflect the capabilities of the target machine, and for the machine
I'm most familiar with the transformation as shown is best.  Getting to
the optimal sequence that you show for ARM could be an interesting
challenge that might require additional logic in the cost model.  I'll
add it to my list of things to think about.

Thanks,
Bill

> 
> R.
> 

Reply via email to