[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-06-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

--- Comment #12 from Richard Biener  ---
Author: rguenth
Date: Thu Jun 22 11:46:45 2017
New Revision: 249552

URL: https://gcc.gnu.org/viewcvs?rev=249552=gcc=rev
Log:
2017-06-22  Richard Biener  

Backport from mainline
2017-02-17  Richard Biener  

PR tree-optimization/79552
* tree-ssa-structalias.c (visit_loadstore): Properly verify
default defs.

2016-04-18  Richard Biener  

PR tree-optimization/43434
* tree-ssa-structalias.c (struct vls_data): New.
(visit_loadstore): Handle all pointer-based accesses.
(compute_dependence_clique): Compute a bitmap of restrict tags
assigned bases and pass it to visit_loadstore.

* gcc.dg/vect/pr43434.c: New testcase.
* c-c++-common/goacc/kernels-alias-3.c: Adjust.
* c-c++-common/goacc/kernels-alias-4.c: Likewise.
* c-c++-common/goacc/kernels-alias-5.c: Likewise.
* c-c++-common/goacc/kernels-alias-6.c: Likewise.
* c-c++-common/goacc/kernels-alias-7.c: Likewise.
* c-c++-common/goacc/kernels-alias-8.c: Likewise.
* gcc.dg/gomp/pr68640.c: Likewise.
* gfortran.dg/goacc/kernels-alias-3.f95: Likewise.
* gfortran.dg/goacc/kernels-alias-4.f95: Likewise.

Modified:
branches/gcc-6-branch/gcc/ChangeLog
branches/gcc-6-branch/gcc/testsuite/ChangeLog
branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/kernels-alias-3.c
branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/kernels-alias-4.c
branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/kernels-alias-5.c
branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/kernels-alias-6.c
branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/kernels-alias-7.c
branches/gcc-6-branch/gcc/testsuite/c-c++-common/goacc/kernels-alias-8.c
branches/gcc-6-branch/gcc/testsuite/gcc.dg/gomp/pr68640.c
branches/gcc-6-branch/gcc/testsuite/gfortran.dg/goacc/kernels-alias-3.f95
branches/gcc-6-branch/gcc/testsuite/gfortran.dg/goacc/kernels-alias-4.f95
branches/gcc-6-branch/gcc/tree-ssa-structalias.c

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-06-22 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
  Known to work||6.3.1
 Resolution|--- |FIXED
  Known to fail||6.2.0

--- Comment #11 from Richard Biener  ---
Fixed.

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

--- Comment #10 from Richard Biener  ---
Author: rguenth
Date: Fri Feb 17 10:43:27 2017
New Revision: 245528

URL: https://gcc.gnu.org/viewcvs?rev=245528=gcc=rev
Log:
2017-02-17  Richard Biener  

PR tree-optimization/79552
* tree-ssa-structalias.c (visit_loadstore): Properly verify
default defs.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-ssa-structalias.c

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
  Known to work||7.0

--- Comment #9 from Richard Biener  ---
Fixed on trunk sofar.

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-17 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

Richard Biener  changed:

   What|Removed |Added

 Depends on||43434

--- Comment #8 from Richard Biener  ---
TO not regress the fix depends on the fix for PR43434.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43434
[Bug 43434] Missed vectorization: "not vectorized: data ref analysis": pointer
incremented by a parameter

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-16 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

--- Comment #7 from Uroš Bizjak  ---
(In reply to Katsunori Kumatani from comment #3)

> In my case, I explicitly told GCC what memory I was going to clobber (with
> the "=m"(*m) output operand). In fact, without telling it I was going to
> clobber it, GCC would remove the asm completely because no other output is
> used at all! This dereference struct trick is also explained in the manual
> for Extended Asm, too.

Indeed, I wasn't read the testcase carefully enough.

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

--- Comment #6 from Jakub Jelinek  ---
So your patch is effectively revesal of PR48885.  But then we need some other
fix for it.

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-16 Thread jakub at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Jakub Jelinek  ---
This changed with r228073.

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

--- Comment #4 from Richard Biener  ---
Ah...

  FOR_EACH_IMM_USE_STMT (use_stmt, ui, ptr)
{
  /* ???  Calls and asms.  */
  if (!gimple_assign_single_p (use_stmt))
continue;

and at PTA time we see

   [100.00%]:
  c.0_1 = (long unsigned int) c_3(D);
  __asm__("rep stosb" : "=D" a_7, "=c" n_8, "=m" MEM[(struct ._0 *)a_5(D)] :
"a" 0, "0" a_5(D), "1" c.0_1);
  _2 = *a_5(D);
  __asm__ __volatile__("xor %0, %0" :  : "q" _2);
  return;

that's first a missed optimization.  The bug is in visit_loadstore which fails
to verify for default defs whether the pointer aliases a restrict one.

Index: gcc/tree-ssa-structalias.c
===
--- gcc/tree-ssa-structalias.c  (revision 245501)
+++ gcc/tree-ssa-structalias.c  (working copy)
@@ -7296,8 +7312,7 @@ visit_loadstore (gimple *, tree base, tr
   || TREE_CODE (base) == TARGET_MEM_REF)
 {
   tree ptr = TREE_OPERAND (base, 0);
-  if (TREE_CODE (ptr) == SSA_NAME
- && ! SSA_NAME_IS_DEFAULT_DEF (ptr))
+  if (TREE_CODE (ptr) == SSA_NAME)
{
  /* We need to make sure 'ptr' doesn't include any of
 the restrict tags we added bases for in its points-to set.  */

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-16 Thread katsunori.kumatani at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

--- Comment #3 from Katsunori Kumatani  ---
(In reply to Uroš Bizjak from comment #1)
> (In reply to Katsunori Kumatani from comment #0)
> 
> > Things to note:
> > 
> > This happens on GCC 6 and up to 7 only, GCC 5.4 generates correct output.
> > This happens once you turn on the -fschedule-insns option. So it's a bug
> > there.
> > If you remove the __restrict__ from the pointer in foo's parameter, the
> > problem is gone.
> > Using "asm volatile" instead of "asm" in memset_test generates correct code.
> > Using "memory" clobber in that asm also generates correct code.
> > 
> > 
> > Most of these workarounds are not valid in this context because they DISABLE
> > the optimizations, so it's like preventing the problem from popping up
> > instead of solving it. "memory" clobber is obviously the worst solution by
> > far as it will kill any cached memory in registers. "asm volatile" is
> > probably the least bad workaround, __restrict__ is definitely useful for
> > same types the compiler can't otherwise know they won't alias.
> 
> "memory" clobber is the correct solution here, as it represents an implied
> compiler barrier. Without it, the compiler is free to schedule loads and
> stores around the "rep stosb".
> 
> IOW, it is the "cached memory in registers" instructions that can be
> scheduled around the "rep stosb" without "memory" clobber.

I don't think it's the correct solution at all as it implies arbitrary memory
writes which the compiler can't know about.

In my case, I explicitly told GCC what memory I was going to clobber (with the
"=m"(*m) output operand). In fact, without telling it I was going to clobber
it, GCC would remove the asm completely because no other output is used at all!
This dereference struct trick is also explained in the manual for Extended Asm,
too.

Because the pointer is __restrict__ this means that it cannot alias anything
else, thus this is the optimal solution since GCC will only uncache/reload the
values that are obtained by dereferencing that pointer only (or a derived
pointer). But in this case, it doesn't seem to -- it will load values obtained
from that pointer before the "output operand" which is wrong.

Obviously the "dummy" asm in function foo isn't valid since it modifies an
input, it was just for demonstration purposes.

Hope that makes sense.


Richard, thank you for confirming it.

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-02-16
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
   [100.00%]:
  c.0_1 = (long unsigned int) c_3(D);
  __asm__("rep stosb" : "=D" a_6, "=c" n_7, "=m" MEM[(struct ._0 *)a_5(D)
clique 1 base 0] : "a" 0, "0" a_5(D), "1" c.0_1);
  _2 = MEM[(char *)a_5(D) clique 1 base 1];
  __asm__ __volatile__("xor %0, %0" :  : "q" _2);

so we have two different restrict bases here which is obviously wrong.  I'll
see why.

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-16 Thread ubizjak at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

--- Comment #1 from Uroš Bizjak  ---
(In reply to Katsunori Kumatani from comment #0)

> Things to note:
> 
> This happens on GCC 6 and up to 7 only, GCC 5.4 generates correct output.
> This happens once you turn on the -fschedule-insns option. So it's a bug
> there.
> If you remove the __restrict__ from the pointer in foo's parameter, the
> problem is gone.
> Using "asm volatile" instead of "asm" in memset_test generates correct code.
> Using "memory" clobber in that asm also generates correct code.
> 
> 
> Most of these workarounds are not valid in this context because they DISABLE
> the optimizations, so it's like preventing the problem from popping up
> instead of solving it. "memory" clobber is obviously the worst solution by
> far as it will kill any cached memory in registers. "asm volatile" is
> probably the least bad workaround, __restrict__ is definitely useful for
> same types the compiler can't otherwise know they won't alias.

"memory" clobber is the correct solution here, as it represents an implied
compiler barrier. Without it, the compiler is free to schedule loads and stores
around the "rep stosb".

IOW, it is the "cached memory in registers" instructions that can be scheduled
around the "rep stosb" without "memory" clobber.

[Bug inline-asm/79552] [6 Regression] Wrong code generation due to -fschedule-insns, with __restrict__ and inline asm

2017-02-16 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79552

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |6.4
Summary|[Regression GCC 6+] Wrong   |[6 Regression] Wrong code
   |code generation due to  |generation due to
   |-fschedule-insns, with  |-fschedule-insns, with
   |__restrict__ and inline asm |__restrict__ and inline asm