> Richard Biener wrote: > On Thu, Feb 26, 2015 at 11:20 PM, Jeff Law <l...@redhat.com> wrote: > > On 02/26/15 10:30, Wilco Dijkstra wrote: > >> > >> Several GCC versions ago a conditional negate optimization was introduced > >> as a workaround for > >> PR45685. However the branchless expansion for conditional negate is > >> extremely inefficient on most > >> targets (5 sequentially dependent instructions rather than 2 on AArch64). > >> Since the underlying issue > >> has been resolved (the example in PR45685 no longer generates a branch on > >> x64), remove the > >> workaround so that conditional negates are treated in exactly the same way > >> as conditional invert, > >> add, subtract, and, orr, xor etc. Simple example: > >> > >> int f(int x) { if (x > 3) x = -x; return x; } > > > > You need to bootstrap and regression test the change before it can be > > approved. > > As Jeff added a testcase for the PHI opt transform to happen I'm sure > testing would shown this as fallout.
Yes that's the only test that starts to fail. I've changed it to scan for cmov/csel instead. Bootstrap is fine for AArch64 and x64 of course. Should the test be moved somewhere else now it is no longer a tree-ssa test? > > You should turn this little example into a testcase. It's fine with me if > > this new test is ARM specific. > > > > > > You should also find a way to change the test gcc.dg/tree-ssa/pr45685.c in > > such a way that it ensures there aren't any undesirable branches. > > I'd be also interested in results of vectorizing a loop with a > conditional negate. > I can very well imagine reverting this patch causing code quality regressions > there. Well vectorized code improves in the same way as you'd expect: void f(int *p, int *q) { int i; for (i = 0; i < 1000; i++) p[i] = (q[i] > 3) ? -q[i] : q[i]; } Before: .L6: vmovdqa (%r9,%rax), %ymm2 addl $1, %edx vpcmpgtd %ymm5, %ymm2, %ymm0 vpand %ymm4, %ymm0, %ymm1 vpsubd %ymm1, %ymm3, %ymm0 vpxor %ymm0, %ymm2, %ymm0 vpaddd %ymm0, %ymm1, %ymm0 vmovups %xmm0, (%rcx,%rax) vextracti128 $0x1, %ymm0, 16(%rcx,%rax) addq $32, %rax cmpl %r8d, %edx jb .L6 After: .L6: vmovdqa (%r9,%rax), %ymm0 addl $1, %edx vpcmpgtd %ymm4, %ymm0, %ymm2 vpsubd %ymm0, %ymm3, %ymm1 vpblendvb %ymm2, %ymm1, %ymm0, %ymm0 vmovups %xmm0, (%rcx,%rax) vextracti128 $0x1, %ymm0, 16(%rcx,%rax) addq $32, %rax cmpl %r8d, %edx jb .L6 > > I've got enough history to know this is fixing a regression of sorts for the > > ARM platform. So once the issues above are addressed it can go forward even > > without a BZ noting the regression. > > But I'd say this is stage1 material at this point. I suppose it doesn't matter as it'll have to be backported anyway. Wilco