https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57534

bin cheng <amker at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |amker at gcc dot gnu.org

--- Comment #33 from bin cheng <amker at gcc dot gnu.org> ---
Came back to this one.


void timer_stop();

volatile long keepgoing = 0;
double hand_benchmark_cache_ronly( double *x, long limit, long *oloops, double
*ous) {
        long index = 0, loops = 0;
        double sum = (double)0;
        double sum2 = (double)0;
        again:   sum += x[index] + x[index+1] + x[index+2] + x[index+3];
        sum2 += x[index+4] + x[index+5] + x[index+6] + x[index+7];
        if ((index += 8) < limit)     goto again;
        else if (keepgoing)     {
                index = 0;
                goto again;
        }
        timer_stop();
        x[0] = (double)sum + (double)sum2;
        x[1] = (double)index;
}

The idea fix to above test would be identifying the first goto as a loop, so
IVOPTs can do strength reduction on address ivs.

While for below case:
int ind;
int cond(void);

double hand_benchmark_cache_ronly( double *x) {
    double sum=0.0;
    while (cond())
        sum += x[ind] + x[ind+1] + x[ind+2] + x[ind+3];
    return sum;
}

It's hard to handle in IVOPTs, because neither niter nor scev analysis
succeeds.  The IVOPTs implementation is centralized to induction variable.  It
would non-trivial change to support such case.

However, I wondered why we missed slsr in previous analysis?  It's designed to
strength reduce such code.  Quoting from its comment:

   Specifically, we are interested in references for which 
   get_inner_reference returns a base address, offset, and bitpos as
   follows:

     base:    MEM_REF (T1, C1)
     offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
     bitpos:  C4 * BITS_PER_UNIT

   Here T1 and T2 are arbitrary trees, and C1, C2, C3, C4 are 
   arbitrary integer constants.  Note that C2 may be zero, in which
   case the offset will be MULT_EXPR (T2, C3).

   When this pattern is recognized, the original memory reference
   can be replaced with:

     MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
              C1 + (C2 * C3) + C4)

It explicitly states that addresses here should be tracked, associated and
reduced as we wanted:  (X + index * 8) + const_offset_x.

I think it's a missed address slsr optimization, i.e, clearly it failed to
identify CAND_REF candidate for memory reference.  After looking into the code,
I think the problem is in slsr_process_ref and restructure_reference.

Trying if I can fix this...

Reply via email to