On Sat, Jul 21, 2018 at 3:28 AM Bin.Cheng <amker.ch...@gmail.com> wrote: > > On Tue, Jul 17, 2018 at 2:08 AM, Kelvin Nilsen <kdnil...@linux.ibm.com> wrote: > > Thanks for looking at this for me. In simplifying the test case for a bug > > report, I've narrowed the "problem" to integer overflow considerations. My > > len variable is declared int, and the target has 64-bit pointers. I'm > > gathering that the "manual transformation" I quoted below is not considered > > "equivalent" to the original source code due to different integer overflow > > behaviors. If I redeclare len to be unsigned long long, then I > > automatically get the optimizations that I was originally expecting. > > > > I suppose this is really NOT a bug? > As your test case demonstrates, it is caused by wrapping unsigned int32. > > > > Is there a compiler optimization flag that allows the optimizer to ignore > > array index integer overflow in considering legal optimizations? > I am not aware of one for unsigned integer, and I guess it won't be > introduced in the future either?
We've had -funsafe-loop-optimizations in the past but that only concerned niter analysis, not scalar evolution analysis which is likely required here. And no, there's no plan to re-introduce those. Richard. > Thanks, > bin > > > > > > > > On 7/13/18 9:14 PM, Bin.Cheng wrote: > >> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen <kdnil...@linux.ibm.com> > >> wrote: > >>> A somewhat old "issue report" pointed me to the code generated for a > >>> 4-fold manually unrolled version of the following loop: > >>> > >>>> while (++len != len_limit) /* this is loop */ > >>>> if (pb[len] != cur[len]) > >>>> break; > >>> > >>> As unrolled, the loop appears as: > >>> > >>>> while (++len != len_limit) /* this is loop */ { > >>>> if (pb[len] != cur[len]) > >>>> break; > >>>> if (++len == len_limit) /* unrolled 2nd iteration */ > >>>> break; > >>>> if (pb[len] != cur[len]) > >>>> break; > >>>> if (++len == len_limit) /* unrolled 3rd iteration */ > >>>> break; > >>>> if (pb[len] != cur[len]) > >>>> break; > >>>> if (++len == len_limit) /* unrolled 4th iteration */ > >>>> break; > >>>> if (pb[len] != cur[len]) > >>>> break; > >>>> } > >>> > >>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the > >>> only induction variable candidates that are being considered are all > >>> forms of the len variable. We are not considering any induction > >>> variables to represent the address expressions &pb[len] and &cur[len]. > >>> > >>> I rewrote the source code for this loop to make the addressing > >>> expressions more explicit, as in the following: > >>> > >>>> cur++; > >>>> while (++pb != last_pb) /* this is loop */ { > >>>> if (*pb != *cur) > >>>> break; > >>>> ++cur; > >>>> if (++pb == last_pb) /* unrolled 2nd iteration */ > >>>> break; > >>>> if (*pb != *cur) > >>>> break; > >>>> ++cur; > >>>> if (++pb == last_pb) /* unrolled 3rd iteration */ > >>>> break; > >>>> if (*pb != *cur) > >>>> break; > >>>> ++cur; > >>>> if (++pb == last_pb) /* unrolled 4th iteration */ > >>>> break; > >>>> if (*pb != *cur) > >>>> break; > >>>> ++cur; > >>>> } > >>> > >>> Now, gcc does a better job of identifying the "address expression > >>> induction variables". This version of the loop runs about 10% faster > >>> than the original on my target architecture. > >>> > >>> This would seem to be a textbook pattern for the induction variable > >>> analysis. Does anyone have any thoughts on the best way to add these > >>> candidates to the set of induction variables that are considered by > >>> tree-ssa-loop-ivopts.c? > >>> > >>> Thanks in advance for any suggestions. > >>> > >> Hi, > >> Could you please file a bug with your original slow test code > >> attached? I tried to construct meaningful test case from your code > >> snippet but not successful. There is difference in generated > >> assembly, but it's not that fundamental. So a bug with preprocessed > >> test would be high appreciated. > >> I think there are two potential issues in cost computation for such > >> case: invariant expression and iv uses outside of loop handled as > >> inside uses. > >> > >> Thanks, > >> bin > >> > >>