On 03/19/2018 10:11 AM, Richard Biener wrote:
On Fri, 16 Mar 2018, Tom de Vries wrote:

On 03/16/2018 12:55 PM, Richard Biener wrote:
On Fri, 16 Mar 2018, Tom de Vries wrote:

On 02/27/2018 01:42 PM, Richard Biener wrote:
Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c     (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c     (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int foo()
+{
+  int a[10];
+  for(int i = 0; i < 10; ++i)
+    a[i] = i*i;
+  int res = 0;
+  for(int i = 0; i < 10; ++i)
+    res += a[i];
+  return res;
+}
+
+/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */

This fails for nvptx, because it doesn't have the required vector
operations.
To fix the fail, I've added requiring effective target vect_int_mult.

On targets that do not vectorize you should see the scalar loops unrolled
instead.  Or do you have only one loop vectorized?

Sort of. Loop vectorization has no effect, and the scalar loops are completely
unrolled. But then slp vectorization vectorizes the stores.

So at optimized we have:
...
   MEM[(int *)&a] = { 0, 1 };
   MEM[(int *)&a + 8B] = { 4, 9 };
   MEM[(int *)&a + 16B] = { 16, 25 };
   MEM[(int *)&a + 24B] = { 36, 49 };
   MEM[(int *)&a + 32B] = { 64, 81 };
   _6 = a[0];
   _28 = a[1];
   res_29 = _6 + _28;
   _35 = a[2];
   res_36 = res_29 + _35;
   _42 = a[3];
   res_43 = res_36 + _42;
   _49 = a[4];
   res_50 = res_43 + _49;
   _56 = a[5];
   res_57 = res_50 + _56;
   _63 = a[6];
   res_64 = res_57 + _63;
   _70 = a[7];
   res_71 = res_64 + _70;
   _77 = a[8];
   res_78 = res_71 + _77;
   _2 = a[9];
   res_11 = _2 + res_78;
   a ={v} {CLOBBER};
   return res_11;
...

The stores and loads are eliminated by dse1 in the rtl phase, and in the end
we have:
...
.visible .func (.param.u32 %value_out) foo
{
         .reg.u32 %value;
         .local .align 16 .b8 %frame_ar[48];
         .reg.u64 %frame;
         cvta.local.u64 %frame, %frame_ar;
         mov.u32 %value, 285;
         st.param.u32    [%value_out], %value;
         ret;
}
...

That's precisely
what the PR was about...  which means it isn't fixed for nvptx :/

Indeed the assembly is not optimal, and would be optimal if we'd have optimal
code at optimized.

FWIW, using this patch we generate optimal code at optimized:
...
diff --git a/gcc/passes.def b/gcc/passes.def
index 3ebcfc30349..6b64f600c4a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -325,6 +325,7 @@ along with GCC; see the file COPYING3.  If not see
        NEXT_PASS (pass_tracer);
        NEXT_PASS (pass_thread_jumps);
        NEXT_PASS (pass_dominator, false /* may_peel_loop_headers_p */);
+      NEXT_PASS (pass_fre);
        NEXT_PASS (pass_strlen);
        NEXT_PASS (pass_thread_jumps);
        NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
...

and we get:
...
.visible .func (.param.u32 %value_out) foo
{
         .reg.u32 %value;
         mov.u32 %value, 285;
         st.param.u32    [%value_out], %value;
         ret;
}
...

I could file a missing optimization PR for nvptx, but I'm not sure where this
should be fixed.

Ah, yeah... the usual issue then.

Can you please XFAIL the test on nvptx instead of requiring vect_int_mult?


Done.

Committed at attached.

Thanks,
- Tom
[testsuite] Add nvptx xfail to pr84512.c

2018-03-19  Tom de Vries  <t...@codesourcery.com>

	* gcc.dg/tree-ssa/pr84512.c: Don't require effective target
	vect_int_mult.  Add nvptx xfail for PR84958.

---
 gcc/testsuite/ChangeLog                 | 5 +++++
 gcc/testsuite/gcc.dg/tree-ssa/pr84512.c | 4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
index 41b6c06..9560160 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
@@ -1,6 +1,5 @@
 /* { dg-do compile } */
 /* { dg-options "-O3 -fdump-tree-optimized" } */
-/* { dg-require-effective-target vect_int_mult } */
 
 int foo()
 {
@@ -13,4 +12,5 @@ int foo()
   return res;
 }
 
-/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */
+/* Target nvptx xfail due to PR84958.  */
+/* { dg-final { scan-tree-dump "return 285;" "optimized" { xfail nvptx*-*-* } } } */

Reply via email to