This is an optimization pass which leads to dramatically better code on

at least one SPEC benchmark.  Ian, Roger, Diego, would one of you care
to review this?

My concern is that as formulated, conditional store elimination is not always a win.

Transforming

   if (cond)
     *p = x;

into

 tmp = *p;
 if (cond)
   tmp = x;
 *p = tmp;

on it's own, effectively transforms a conditional write to memory into an unconditional write to memory. On many platforms, even x86, this a pessimization. For example, the "Intel Architecture Optimization Manual", available at ftp://download.intel.com/design/PentiumII/manuals/24281603.PDF in section 3.5.5 "Write Allocation Effects", actually recommends the inverse transformation. On page 3-21 they show how the "Sieve of Erastothenes" benchmark can be sped up on Pentium class processors by transforming the line

   array[j] = 0;

into the equivalent

   if (array[j] != 0)
     array[j] = 0;

i.e. by introducing conditional stores.


The significant observation with Michael Matz's extremely impressive 26% improvement on 456.hmmer is the interaction between this transformation with other passes, that allow the conditional store to be hoisted out of a critical loop. By reading the value into a "tmp" before the loop, conditionally storing to the register tmp in the loop, then unconditionally writing the result back afterwards, we dramatically reduce the number of memory writes, rather than increase them as when this transformation is applied in isolation.


I think the correct fix is not to apply this transformation everywhere, but to correctly identify those loop cases where it helps and perform the loop transformation there. i.e. conditional induction variable identification, hoisting and sinking needs to be improved instead of pessimizing code to a simpler form that allows our existing flawed passes to trigger.


I do very much like the loop-restricted version of this transformation, and it's impressive impact of HMMR (whose author Sean Eddy is a good friend). Perhaps Mark might give revised versions of this patch special dispensation to be applied in stage 3. I'd not expect any correctness issues/bugs, just performance trade-offs that need to be investigated. Perhaps we should even apply this patch as is during stage 2, and allow the potential non-loop performance degradations to be addressed as follow-up patches and therefore regression fixes suitable for stage 3?

Congratulations again to Michael for this impressive performance improvement.

Roger
--
Roger Sayle, Ph.D.
OpenEye Scientific Software,
Suite #D, 9 Bisbee Court,
Santa Fe, New Mexico, 87508.

Reply via email to