On Tue, Jul 17, 2018 at 2:08 AM, Kelvin Nilsen <kdnil...@linux.ibm.com> wrote: > Thanks for looking at this for me. In simplifying the test case for a bug > report, I've narrowed the "problem" to integer overflow considerations. My > len variable is declared int, and the target has 64-bit pointers. I'm > gathering that the "manual transformation" I quoted below is not considered > "equivalent" to the original source code due to different integer overflow > behaviors. If I redeclare len to be unsigned long long, then I automatically > get the optimizations that I was originally expecting. > > I suppose this is really NOT a bug? As your test case demonstrates, it is caused by wrapping unsigned int32. > > Is there a compiler optimization flag that allows the optimizer to ignore > array index integer overflow in considering legal optimizations? I am not aware of one for unsigned integer, and I guess it won't be introduced in the future either?
Thanks, bin > > > > On 7/13/18 9:14 PM, Bin.Cheng wrote: >> On Fri, Jul 13, 2018 at 6:04 AM, Kelvin Nilsen <kdnil...@linux.ibm.com> >> wrote: >>> A somewhat old "issue report" pointed me to the code generated for a 4-fold >>> manually unrolled version of the following loop: >>> >>>> while (++len != len_limit) /* this is loop */ >>>> if (pb[len] != cur[len]) >>>> break; >>> >>> As unrolled, the loop appears as: >>> >>>> while (++len != len_limit) /* this is loop */ { >>>> if (pb[len] != cur[len]) >>>> break; >>>> if (++len == len_limit) /* unrolled 2nd iteration */ >>>> break; >>>> if (pb[len] != cur[len]) >>>> break; >>>> if (++len == len_limit) /* unrolled 3rd iteration */ >>>> break; >>>> if (pb[len] != cur[len]) >>>> break; >>>> if (++len == len_limit) /* unrolled 4th iteration */ >>>> break; >>>> if (pb[len] != cur[len]) >>>> break; >>>> } >>> >>> In examining the behavior of tree-ssa-loop-ivopts.c, I've discovered the >>> only induction variable candidates that are being considered are all forms >>> of the len variable. We are not considering any induction variables to >>> represent the address expressions &pb[len] and &cur[len]. >>> >>> I rewrote the source code for this loop to make the addressing expressions >>> more explicit, as in the following: >>> >>>> cur++; >>>> while (++pb != last_pb) /* this is loop */ { >>>> if (*pb != *cur) >>>> break; >>>> ++cur; >>>> if (++pb == last_pb) /* unrolled 2nd iteration */ >>>> break; >>>> if (*pb != *cur) >>>> break; >>>> ++cur; >>>> if (++pb == last_pb) /* unrolled 3rd iteration */ >>>> break; >>>> if (*pb != *cur) >>>> break; >>>> ++cur; >>>> if (++pb == last_pb) /* unrolled 4th iteration */ >>>> break; >>>> if (*pb != *cur) >>>> break; >>>> ++cur; >>>> } >>> >>> Now, gcc does a better job of identifying the "address expression induction >>> variables". This version of the loop runs about 10% faster than the >>> original on my target architecture. >>> >>> This would seem to be a textbook pattern for the induction variable >>> analysis. Does anyone have any thoughts on the best way to add these >>> candidates to the set of induction variables that are considered by >>> tree-ssa-loop-ivopts.c? >>> >>> Thanks in advance for any suggestions. >>> >> Hi, >> Could you please file a bug with your original slow test code >> attached? I tried to construct meaningful test case from your code >> snippet but not successful. There is difference in generated >> assembly, but it's not that fundamental. So a bug with preprocessed >> test would be high appreciated. >> I think there are two potential issues in cost computation for such >> case: invariant expression and iv uses outside of loop handled as >> inside uses. >> >> Thanks, >> bin >> >>