https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57534
bin cheng <amker at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|unassigned at gcc dot gnu.org |amker at gcc dot gnu.org --- Comment #33 from bin cheng <amker at gcc dot gnu.org> --- Came back to this one. void timer_stop(); volatile long keepgoing = 0; double hand_benchmark_cache_ronly( double *x, long limit, long *oloops, double *ous) { long index = 0, loops = 0; double sum = (double)0; double sum2 = (double)0; again: sum += x[index] + x[index+1] + x[index+2] + x[index+3]; sum2 += x[index+4] + x[index+5] + x[index+6] + x[index+7]; if ((index += 8) < limit) goto again; else if (keepgoing) { index = 0; goto again; } timer_stop(); x[0] = (double)sum + (double)sum2; x[1] = (double)index; } The idea fix to above test would be identifying the first goto as a loop, so IVOPTs can do strength reduction on address ivs. While for below case: int ind; int cond(void); double hand_benchmark_cache_ronly( double *x) { double sum=0.0; while (cond()) sum += x[ind] + x[ind+1] + x[ind+2] + x[ind+3]; return sum; } It's hard to handle in IVOPTs, because neither niter nor scev analysis succeeds. The IVOPTs implementation is centralized to induction variable. It would non-trivial change to support such case. However, I wondered why we missed slsr in previous analysis? It's designed to strength reduce such code. Quoting from its comment: Specifically, we are interested in references for which get_inner_reference returns a base address, offset, and bitpos as follows: base: MEM_REF (T1, C1) offset: MULT_EXPR (PLUS_EXPR (T2, C2), C3) bitpos: C4 * BITS_PER_UNIT Here T1 and T2 are arbitrary trees, and C1, C2, C3, C4 are arbitrary integer constants. Note that C2 may be zero, in which case the offset will be MULT_EXPR (T2, C3). When this pattern is recognized, the original memory reference can be replaced with: MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)), C1 + (C2 * C3) + C4) It explicitly states that addresses here should be tracked, associated and reduced as we wanted: (X + index * 8) + const_offset_x. I think it's a missed address slsr optimization, i.e, clearly it failed to identify CAND_REF candidate for memory reference. After looking into the code, I think the problem is in slsr_process_ref and restructure_reference. Trying if I can fix this...