[Bug tree-optimization/94963] [11 Regression] Spurious uninitialized warning for static variable building glibc

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94963

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2020-05-06
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Target Milestone|--- |11.0

--- Comment #1 from Richard Biener  ---
Confirmed.  I've met the underlying issue when developing the patch and for
this reason marked the conditional store inserted by LIM with no-warning.
But for the testcase that's not enough since now PRE comes along and
optimizes the var.field load away, re-exposing the issue.

LIM transforms the testcase to (simplified a bit)

void
f (void)
{
  if (pv != 0)
{
  bool v2_set = false;
  bool varfield_set = false;
  int v2tem, varfield_tem;
for (const P *ph = pv; ph < &pv[ps]; ++ph)
  switch (ph->p1)
{
case 1:
  v2tem = ph->p2;
  v2_set = true;
  break;
case 2:
  varfield_tem = ph->p3;
  varfield_set = true;
  break;
}
  if (varfield_set)
var.field = varfield_tem;
  if (v2_set)
v2 = v2tem;
 }
  if (var.field != 0)
foo (&var);
}

where the uninit predicate analysis doesn't grok the relation between
varfield_set and varfield_tem being initialized.

The patch changed code generation to elide the previously emitted
unconditional load of v2 and var.field.  I suspected that for
the case where there is no load the loop PHI for varfield_tem
might be eliminated, but it is not in all cases it seems.  Now
apart from marking the store no-warning we could easily initialize
the tems on loop entry, just not with their true value but for example
with zero.  That might result in less optimal out-of-SSA though
(no coalescing with constants, the constant move needs to be emitted...)
at least when the loop PHI is not eliminated.

What works is initializing with an uninitialized variable marked
TREE_NO_WARNING.  I'm going to test that (eliding the no-warning
on the conditional stores).

[Bug tree-optimization/94963] [11 Regression] Spurious uninitialized warning for static variable building glibc

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94963

--- Comment #2 from Richard Biener  ---
Created attachment 48459
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48459&action=edit
patch in testing

Testing the attached.

[Bug tree-optimization/94964] [8/9/10/11 Regression] ICE in add_phi_arg, at tree-phinodes.c:359 since r8-2993-ga7976089dba5e227

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94964

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Target Milestone|--- |8.5
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #1 from Richard Biener  ---
Mine.  The loop does not have a preheader we can sink to so
gsi_insert_seq_on_edge_immediate will split the edge and the following
add_phi_arg breaks.

Now, the loop entry edge is an EH edge in this case, will dig what
the appropriate solution is.

[Bug tree-optimization/94965] [11 Regression] ICE during SLP since r11-59-g308bc496884706af4b3077171cbac684c7a6f7c6

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94965

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #1 from Richard Biener  ---
Huh.  mine.

[Bug tree-optimization/94965] [11 Regression] ICE during SLP since r11-59-g308bc496884706af4b3077171cbac684c7a6f7c6

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94965

--- Comment #2 from Richard Biener  ---
@@ -9319,7 +9364,8 @@ vectorizable_load (stmt_vec_info stmt_info,
gimple_stmt_it
erator *gsi,
 initialized yet, use first_stmt_info_for_drptr DR by bumping the
 distance from first_stmt_info DR instead as below.  */
   if (!diff_first_stmt_info)
-   msq = vect_setup_realignment (first_stmt_info, gsi, &realignment_token,
+   msq = vect_setup_realignment (loop_vinfo,
+ first_stmt_info, gsi, &realignment_token,
  alignment_support_scheme, NULL_TREE,
  &at_loop);
   if (alignment_support_scheme == dr_explicit_realign_optimized)

that should have been 'vinfo', not 'loop_vinfo'.

[Bug tree-optimization/94965] [11 Regression] ICE during SLP since r11-59-g308bc496884706af4b3077171cbac684c7a6f7c6

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94965

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug c/94968] [10/11 Regression] internal compiler error: tree check: expected class ‘type’, have ‘exceptional’ (error_mark) in useless_type_conversion_p, at gimple-expr.c:87

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94968

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P4
   Target Milestone|--- |10.2

[Bug tree-optimization/94969] [8/9/10/11 Regression] Invalid loop distribution

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94969

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.5
   Keywords||wrong-code
   Last reconfirmed||2020-05-06
 Status|UNCONFIRMED |NEW
Summary|Invalid loop distribution   |[8/9/10/11 Regression]
   ||Invalid loop distribution
  Known to work||7.5.0
 Ever confirmed|0   |1

--- Comment #3 from Richard Biener  ---
Confirmed.  Works fine in GCC 7 which also says

Creating dr for f[pretmp_5].e
analyze_innermost: Applying pattern match.pd:84, generic-match.c:11461
failed: bit offset alignment.
base_address:
offset from base address:
constant offset from base address:
step:
aligned to:
base_object: f
Access function 0: 7
Access function 1: pretmp_5

but

(compute_affine_dependence
  stmt_a: f[pretmp_5] = g;
  stmt_b: _2 = f[pretmp_5].e;
) -> dependence analysis failed

instead of

(compute_affine_dependence
  stmt_a: f[pretmp_5] = g;
  stmt_b: _2 = f[pretmp_5].e;
(analyze_overlapping_iterations
  (chrec_a = pretmp_5)
  (chrec_b = pretmp_5)
  (overlap_iterations_a = [0])
  (overlap_iterations_b = [0]))
)

[Bug tree-optimization/94969] [8/9/10/11 Regression] Invalid loop distribution since r8-2390-gdfbddbeb1ca912c9

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94969

--- Comment #5 from Richard Biener  ---
So I think the issue is not dependence testing but loop distribution accepting
a
zero dependence distance as OK.  Of course dependence analysis is quite useless
here since the accesses are to the same location in every iteration.

Bin, maybe you can share your thoughts on this issue?

The testcase doesn't need bitfields - those just disable the cost model
which otherwise prevents the distribution.

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 44423215332..ac272d63c3d 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -2852,6 +2852,7 @@ loop_distribution::finalize_partitions (class loop *loop,
   /* Don't distribute current loop into too many loops given we don't have
  memory stream cost model.  Be even more conservative in case of loop
  nest distribution.  */
+#if 0
   if ((same_type_p && num_builtin == 0
&& (loop->inner == NULL || num_normal != 2 || num_partial_memset != 1))
   || (loop->inner != NULL
@@ -2867,6 +2868,7 @@ loop_distribution::finalize_partitions (class loop *loop,
}
   partitions->truncate (1);
 }
+#endif

   /* Fuse memset builtins if possible.  */
   if (partitions->length () > 1)


makes the testcase miscompiled even with the : 7 and : 2 commented, so plain

struct S {
  signed m;
  signed e;
};

[Bug tree-optimization/94969] [8/9/10/11 Regression] Invalid loop distribution since r8-2390-gdfbddbeb1ca912c9

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94969

--- Comment #6 from Richard Biener  ---
Before Richards change we likely gave up on the mismatch in access function
dimensionality for f[b] vs. f[b].e but now we compute a dependence distance
of zero.

[Bug tree-optimization/94963] [11 Regression] Spurious uninitialized warning for static variable building glibc

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94963

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Richard Biener  ---
Should be fixed.

[Bug tree-optimization/94964] [8/9/10 Regression] ICE in add_phi_arg, at tree-phinodes.c:359 since r8-2993-ga7976089dba5e227

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94964

Richard Biener  changed:

   What|Removed |Added

  Known to fail|11.0|10.0
Summary|[8/9/10/11 Regression] ICE  |[8/9/10 Regression] ICE in
   |in add_phi_arg, at  |add_phi_arg, at
   |tree-phinodes.c:359 since   |tree-phinodes.c:359 since
   |r8-2993-ga7976089dba5e227   |r8-2993-ga7976089dba5e227
  Known to work||11.0
   Priority|P3  |P2

--- Comment #3 from Richard Biener  ---
Fixed on trunk sofar.

[Bug target/94865] Failure to combine unpckhpd+unpcklpd into blendps

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94865

--- Comment #2 from Richard Biener  ---
Missing match.pd patterns also include a no-op comb of insertion of an
extracted element at the same position

(simplify
  (bit_insert @0 (BIT_FIELD_REF @0 @size @pos) @pos)
  (if (size matches)
   @0)

in addition to the requested

(simplify
  (bit_insert @0 (BIT_FIELD_REF @1 @rsize @rpos) @ipos)
  (if (@0 and @1 are vectors compatible for a vec_perm)
   (vec_perm @0 @1 { shuffle-mask }))

[Bug c++/94973] compile error when trying to use pointer-to-member function as invokable projection to ranges::find()

2020-05-06 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94973

--- Comment #13 from Richard Biener  ---
Does MSVC still accept that [without diagnostic]?  Maybe it's time to remove it
completely...

[Bug fortran/94978] [8/9/10/11 Regression] Bogus warning "Array reference at (1) out of bounds in loop beginning at (2)"

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94978

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |8.5
   Keywords||diagnostic

[Bug target/94865] Failure to combine unpckhpd+unpcklpd into blendps

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94865

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
  Known to work||11.0
  Known to fail||10.0

--- Comment #33 from Richard Biener  ---
Fixed on trunk.

[Bug target/94980] [8/9/10/11 Regression] ICE: verify_gimple failed: position plus size exceeds size of referenced object in 'bit_field_ref' with -mavx512vl

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94980

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code
   Priority|P3  |P2
   Target Milestone|--- |8.5

[Bug rtl-optimization/94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Richard Biener  ---
Addressed by the patch for PR94865.

[Bug tree-optimization/88540] Issues with vectorization of min/max operations

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88540

Richard Biener  changed:

   What|Removed |Added

 Blocks||94864
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #4 from Richard Biener  ---
Mine.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864
[Bug 94864] Failure to combine vunpckhpd+movsd into single vunpckhpd

[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED
  Known to work||11.0

--- Comment #6 from Richard Biener  ---
Fixed for GCC 11.

[Bug middle-end/94988] [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]

2020-05-07 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Last reconfirmed||2020-05-08
 Status|UNCONFIRMED |ASSIGNED
 Blocks||57359
   Target Milestone|--- |11.0
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #1 from Richard Biener  ---
Ah, forgot to update this testcase.  This is another instance of PR57359, that
is, we may not sink the store to b across the store to *b since b may point
to itself and with j == 1 we'd change

 b = b + 2;
 *b = x;

to

 *b = x;
 b = b + 2;

note there's a twist for this particular case, namely the preceeding load
of 'b' gives us knowledge about the dynamic type of 'b' which means we
could use that to assess that we _can_ exchange the stores.

But that logic is not implemented.

I'll see how to do that.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
[Bug 57359] store motion causes wrong code for union access at -O3

[Bug middle-end/94988] [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]

2020-05-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

--- Comment #2 from Richard Biener  ---
(In reply to Richard Biener from comment #1)
> Ah, forgot to update this testcase.  This is another instance of PR57359,
> that is, we may not sink the store to b across the store to *b since b may
> point
> to itself and with j == 1 we'd change
> 
>  b = b + 2;
>  *b = x;
> 
> to
> 
>  *b = x;
>  b = b + 2;
> 
> note there's a twist for this particular case, namely the preceeding load
> of 'b' gives us knowledge about the dynamic type of 'b' which means we
> could use that to assess that we _can_ exchange the stores.
> 
> But that logic is not implemented.
> 
> I'll see how to do that.

OK, we can't.  Consider the following which we miscompile with GCC 10
but which is fixed on trunk.  bar () is simply the inner loop of
bar in the pr64110.c testcase.  GCC 10 and earlier transform

  b++;
  *b = x;

to

  tem = b + 1;
  *b = x;
  b = tem;

which is wrong with b == &b, the *b = x store re-purposes the
storage in 'b'.

short *b;

void __attribute__((noipa))
bar (short x, int j)
{
  for (int i = 0; i < j; ++i)
*b++ = x;
}

int
main()
{
  b = (short *)&b;
  bar (0, 1);
  if ((short)(unsigned long)b != 0)
__builtin_abort ();
  return 0;
}

Now the only thing that can be done (as noted in PR57359) is
re-materializing _both_ stores on the exit.   Thus turn

  for (int i = 0; i < j; ++i)
{
  tem = b;
  tem = tem + 1;
  b = tem;
  *tem = x;
}

into

  tem = b;
  for (int i = 0; i < j; ++i)
{
  tem = tem + 1;
  *tem = x;
}
  b = tem;
  *tem = x;

when applying store-motion.  Note this only works when b is written to
unconditionally.  It also needs some kind of a cost model I guess...

[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union

2020-05-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703

--- Comment #9 from Richard Biener  ---
(In reply to Rainer Orth from comment #7)
> Created attachment 48483 [details]
> 32-bit sparc-sun-solaris2.11 pr94703.c.021t.ssa
> 
> The new testcase FAILs on sparc-sun-solaris2.11 (both 32 and 64-bit):
> 
> +FAIL: gcc.dg/tree-ssa/pr94703.c scan-tree-dump ssa "No longer having
> address taken: r"

Hmm, OK looks like memcpy is not folded, likely because the source is
not known to be appropriately aligned.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr94703.c
b/gcc/testsuite/gcc.dg/tree-ssa/pr94703.c
index 7209fa0a4d4..eadea45a32f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/pr94703.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr94703.c
@@ -4,6 +4,7 @@
 unsigned int set_lowpart (unsigned int const *X)
 {
   unsigned int r = 0;
+  X = __builtin_assume_aligned (X, sizeof (unsigned int) / 2);
   __builtin_memcpy(&r,X,sizeof (unsigned int) / 2);
   return r;
 }

should fix this.  Can you verify and if so, commit?  Thx.

[Bug tree-optimization/95001] std::terminate() and abort() do not have __builtin_unreachable() semantics

2020-05-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95001

--- Comment #1 from Richard Biener  ---
Sorry, but noreturn functions can have side-effects that need to be preserved.

[Bug bootstrap/94998] GCC 10 won't configure for host=x86, build!=host, linker=bfd due to CET

2020-05-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94998

Richard Biener  changed:

   What|Removed |Added

 Status|WAITING |NEW
  Component|target  |bootstrap
   Host||x86_64-linux

--- Comment #2 from Richard Biener  ---
Ugh.

[Bug middle-end/94994] [10/11 Regression] possible miscompilation of word-at-a-time copy via packed structs

2020-05-08 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2020-05-08
   Target Milestone|--- |10.2
 Status|UNCONFIRMED |NEW
   Priority|P3  |P2
   Keywords||wrong-code
 Ever confirmed|0   |1

--- Comment #2 from Richard Biener  ---
Confirmed.

[Bug middle-end/95021] [10/11 Regression] Bogus -Wclobbered warning

2020-05-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95021

Richard Biener  changed:

   What|Removed |Added

   Keywords||diagnostic
 CC|rguenther at suse dot de   |law at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Target||x86_64-*-*

--- Comment #3 from Richard Biener  ---
IIRC Jeff was working on replacing -Wclobbered

[Bug target/95023] Offloading AMD GCN wiki cannot be followed

2020-05-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95023

Richard Biener  changed:

   What|Removed |Added

 Target||gcn
   Keywords||documentation

--- Comment #1 from Richard Biener  ---
It's upstream newlib, https://sourceware.org/newlib/

[Bug regression/95025] [11 Regression] ICE in execute_sm_exit at gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813

2020-05-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

[Bug sanitizer/95033] [11 Regression] ICE in set_parm_rtl, at cfgexpand.c:1310 since r11-165-geb72dc663e9070b2

2020-05-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95033

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug tree-optimization/95045] wrong code at -O3 on x86_64-linux-gnu

2020-05-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Last reconfirmed||2020-05-11
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Mine.

[Bug regression/95025] [11 Regression] ICE in execute_sm_exit at gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025

--- Comment #2 from Richard Biener  ---
(In reply to David Binderman from comment #1)
> I see this bug also. Another C test case is available on request.

Please attach it.

[Bug tree-optimization/95045] [11 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045

--- Comment #2 from Richard Biener  ---
OK, this one is an interesting one (might be also latent before the rewrite). 
I'll deal with it separately.  The issue is around the inner loop having
multiple exits, one being also the exit from the outer loop and edge
inserts on that edge getting mis-ordered (we commit them only after processing
all inserts).

[Bug tree-optimization/95049] GCC never terminates with trivial input program

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95049

Richard Biener  changed:

   What|Removed |Added

  Component|c   |tree-optimization
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2020-05-11
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Richard Biener  ---
Mine.

[Bug c/95052] Excess padding of partially initialized strings/char arrays

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95052

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-05-11
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
I'm not sure what you describe as padding is padding.  Instead it's valid to
access all elements of the array you declare and thus it must be initialized.

What could be done is elide zero-padding parts to a memset() call.

[Bug tree-optimization/95051] error: invalid RHS for gimple memory store:

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95051

Richard Biener  changed:

   What|Removed |Added

  Component|c   |tree-optimization
Version|unknown |11.0
 CC||marxin at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 Depends on||95033
   Last reconfirmed||2020-05-11

--- Comment #3 from Richard Biener  ---
Confirmed, looks related to PR95033

The ICE occurs in sanopt


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95033
[Bug 95033] [11 Regression] ICE in set_parm_rtl, at cfgexpand.c:1310 since
r11-165-geb72dc663e9070b2

[Bug tree-optimization/95049] [9/10/11 Regression] GCC never terminates with trivial input program

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95049

Richard Biener  changed:

   What|Removed |Added

Summary|GCC never terminates with   |[9/10/11 Regression] GCC
   |trivial input program   |never terminates with
   ||trivial input program
   Target Milestone|--- |9.4
   Priority|P3  |P2

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 90668, which changed state.

Bug 90668 Summary: loop invariant moving a dependent store out of a loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90668

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

[Bug tree-optimization/90668] loop invariant moving a dependent store out of a loop

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90668

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #5 from Richard Biener  ---
Dup.

*** This bug has been marked as a duplicate of bug 57359 ***

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359

Richard Biener  changed:

   What|Removed |Added

 CC||msebor at gcc dot gnu.org

--- Comment #34 from Richard Biener  ---
*** Bug 90668 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/95056] [11 Regression] slp-perm-9.c fails on aarch64 after gbc484e250990393e887f7239157cc85ce6fadcce

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95056

Richard Biener  changed:

   What|Removed |Added

Version|10.0|11.0
  Component|target  |tree-optimization
   Target Milestone|--- |11.0
   Keywords||missed-optimization
Summary|slp-perm-9.c fails on   |[11 Regression]
   |aarch64 after   |slp-perm-9.c fails on
   |gbc484e250990393e887f723915 |aarch64 after
   |7cc85ce6fadcce  |gbc484e250990393e887f723915
   ||7cc85ce6fadcce

--- Comment #1 from Richard Biener  ---
Hmm, load-lane support should be unaffected (but I didn't test obviously).  I
hope aarch64 folks can investigate - eventually the permute check done in
vectorizable_load needs adjustment / moving.

[Bug target/95055] [11 Regression] gcc.dg/compat/scalar-by-value-3 fails on aarch64 after r11-165-geb72dc663e9070b281be83a80f6f838a3a878822

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95055

Richard Biener  changed:

   What|Removed |Added

Summary|gcc.dg/compat/scalar-by-val |[11 Regression]
   |ue-3 fails on aarch64 after |gcc.dg/compat/scalar-by-val
   |r11-165-geb72dc663e9070b281 |ue-3 fails on aarch64 after
   |be83a80f6f838a3a878822  |r11-165-geb72dc663e9070b281
   ||be83a80f6f838a3a878822
   Target Milestone|--- |11.0
Version|10.0|11.0
 CC||rguenth at gcc dot gnu.org
   Keywords||wrong-code

[Bug fortran/95053] [11.0 regression] ICE in f951: gfc_divide()

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95053

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug tree-optimization/95058] [11 regression] vector test case failures starting with r11-205

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95058

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0
  Component|other   |tree-optimization

--- Comment #1 from Richard Biener  ---
Can you attach the dumps for power7 and "the rest"?

[Bug regression/95025] [11 Regression] ICE in execute_sm_exit at gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Richard Biener  ---
Fixed.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 95025, which changed state.

Bug 95025 Summary: [11 Regression] ICE in execute_sm_exit at 
gcc/tree-ssa-loop-im.c:2224 since r11-161-g283cb9ea6293e813
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95025

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/94988] [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 94988, which changed state.

Bug 94988 Summary: [11 Regression] FAIL: gcc.target/i386/pr64110.c 
scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/95045] [11 Regression] wrong code at -O3 on x86_64-linux-gnu

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 95045, which changed state.

Bug 95045 Summary: [11 Regression] wrong code at -O3 on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95045

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug libgomp/95062] [10/11 Regression] libgomp.oacc-c-c++-common/pr92843-1.c:26: verify_array: Assertion `array[i] == value' failed

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95062

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.2

[Bug c++/95063] [11 Regression] ICE in tsubst_decl, at cp/pt.c:14633

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95063

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug tree-optimization/95060] vfnmsub132ps is not generated with -ffast-math

2020-05-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95060

Richard Biener  changed:

   What|Removed |Added

Version|unknown |11.0
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-05-12
   Keywords||missed-optimization
 Ever confirmed|0   |1
 Target||x86_64-*-* i?86-*-*

--- Comment #3 from Richard Biener  ---
FMA generation already folds the FMA stmt:

  if (cond)
fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1,
   op2, addop, else_value);
  else
fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop);
  gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt));
  gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun,
   use_stmt));
  gsi_replace (&gsi, fma_stmt, true);
  /* Follow all SSA edges so that we generate FMS, FNMA and FNMS
 regardless of where the negation occurs.  */
  gimple *orig_stmt = gsi_stmt (gsi);
  if (fold_stmt (&gsi, follow_all_ssa_edges))
{
  if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi)))
gcc_unreachable ();
  update_stmt (gsi_stmt (gsi));

but not the negate it feeds since with -ffast-math we have
-((a[i] * b[i]) + c[i]) as canonical form it seems (reassoc does this).

float r[8], a[8], b[8], c[8];

void
test_fnms (void)
{
  for (int i = 0; i < 8; i++)
r[i] = -((a[i] * b[i]) + c[i]);
}

would be an alternative testcase, not handled without -ffast-math either.

I'd suggest to fold the single-use stmt of the fma_stmts lhs if any
[and if it is a negate].

[Bug fortran/95067] [9/10/11 Regression] ICE in tree_fits_shwi_p, at tree.c:7262

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95067

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |9.4

--- Comment #2 from Richard Biener  ---
That commit looks totally unrelated ... but it's eventually that

  /* If there was an input error and we don't really have a type,
 avoid crashing and write something that is at least valid
 by assuming `int'.  */
  if (type == error_mark_node)
type = integer_type_node;

in dbxout_type makes us later use uninitialized low/high.  using
void_type_node might be less error-prone here.

Untested suggestion, that is.  Take it or leave it ;)  (stabs should go away)

[Bug middle-end/95072] [10/11 Regression] -Warray-bounds false positive with flexible array bounds (regression from GCC 9)

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95072

Richard Biener  changed:

   What|Removed |Added

Summary|-Warray-bounds false|[10/11 Regression]
   |positive with flexible  |-Warray-bounds false
   |array bounds (regression|positive with flexible
   |from GCC 9) |array bounds (regression
   ||from GCC 9)
   Priority|P3  |P2
   Target Milestone|--- |10.2

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #19 from Richard Biener  ---
Is libgfortran built with -O2 -funroll-loops or with -O3 (IIRC -O3?).  Note we
see

Estimating sizes for loop 3
 BB: 14, after_exit: 0
  size:   1 _20 = count[n_95];
  size:   1 _21 = _20 + 1;
  size:   1 count[n_95] = _21;
  size:   1 _22 = stride[n_95];
  size:   0 _23 = (long unsigned int) _22;
  size:   1 _44 = _23 - _82;
  size:   1 _45 = _44 * 4;
  size:   1 src_62 = src_85 + _45;
  size:   1 _25 = extent[n_95];
  size:   2 if (_21 == _25)
 BB: 20, after_exit: 1
 BB: 13, after_exit: 0
  size:   1 count[n_95] = 0;
  size:   1 _18 = _22 * _25;
  size:   0 _19 = (long unsigned int) _18;
  size:   1 n_60 = n_95 + 1;
   Induction variable computation will be folded away.
  size:   2 if (dim_43 == n_60)
   Exit condition will be eliminated in last copy.
size: 15-1, last_iteration: 15-3
  Loop size: 15
  Estimated size after unrolling: 129
Making edge 13->20 impossible by redistributing probability to other edges.
../../../trunk/libgfortran/generated/in_pack_i4.c:100:14: optimized: loop with
13 iterations completely unrolled (header execution count 23565294)
Last iteration exit edge was proved true.

Note even with the rs6000 limits turned back to default I see the loop
unrolled (with -O3 or -O2 -funroll-loops).

Checking on x86_64 the file is compiled with -O2 only and we have

size: 17-1, last_iteration: 10-3
  Loop size: 17
  Estimated size after unrolling: 154
Not unrolling loop 3: size would grow.

so what's the speciality on POWER?  Code growth should trigger with -O3 only.
Given we have only a guessed profile (and that does not detect the inner
loop as completely cold) we're allowing growth then.  GCC has no idea the
outer loop iterates more than the inner.

Note re-structuring the loop to use down-counting count[] from extent[] to zero
would be worth experimenting with, likewise "peeling" the dim == 0 loop
and not making the outermost loop key on 'src' (can 'src' be NULL on entry?).

Anyway, completely peeling this loop looks useless - the only benefit
might be better branch prediction (each dimension gets its own entry
in the predictor cache).

If POWER cannot cope with large loops then I wonder why POWER people
increased limits (though even the default limits would unroll the loop).

Thomas - where did you measure the slowness?  For which dimensionality?
I'm quite sure the loop structure will be sub-optimal for certain
input shapes... (stride0 == 1 could even use memcpy for the inner dimension).

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #20 from Richard Biener  ---
(In reply to Jiu Fu Guo from comment #18)
> Currently, I'm thinking to enhance GCC 'cunroll' as:
> if the loop has multi-exits or upbound is not a fixed number, we may not do
> 'complete unroll' for the loop, except -funroll-all-loops is specified.

That doens't make much sense (-funroll-all-loops is RTL unroller only).

I think the growth limits are simply too large unless we compute a "win"
which we in this case do not.  So I'd say the growth limits should scale
with win ^ (1/new param) thus if we estimate to eliminate 20% of the
loop stmts due to unrolling then the limit to apply is
limit * (0.2 ^ (1/X)) with X maybe defaulting to 2.

I'd only apply this new limit for peeling (peeling is when the loop count
is not constant and thus we keep the exit tests).

Of course people want more peeling (hello POWER people!)

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #23 from Richard Biener  ---
(In reply to Richard Biener from comment #20)
> (In reply to Jiu Fu Guo from comment #18)
> > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > if the loop has multi-exits or upbound is not a fixed number, we may not do
> > 'complete unroll' for the loop, except -funroll-all-loops is specified.
> 
> That doens't make much sense (-funroll-all-loops is RTL unroller only).
> 
> I think the growth limits are simply too large unless we compute a "win"
> which we in this case do not.  So I'd say the growth limits should scale
> with win ^ (1/new param) thus if we estimate to eliminate 20% of the
> loop stmts due to unrolling then the limit to apply is
> limit * (0.2 ^ (1/X)) with X maybe defaulting to 2.
> 
> I'd only apply this new limit for peeling (peeling is when the loop count
> is not constant and thus we keep the exit tests).
> 
> Of course people want more peeling (hello POWER people!)

Btw, the issue with the rs6000 code at present is that it uses
unroll_only_small_loops but that only affects the RTL unroller
while the enablement of -funroll-loops at -O2 affects GIMPLE
as well but unconstrained (with -O3 params).  For the main
unroll pass (not cunrolli) this triggers code size growth:

  unsigned int val = tree_unroll_loops_completely (flag_unroll_loops
   || flag_peel_loops
   || optimize >= 3, true);

the "original" patch also adjusted parameters.  If the intent is to only
affect the RTL unroller then we need a separate flag controlling it
(yeah, using the same flags as heuristic trigger was probably bad).

[Bug debug/95080] [10/11 Regression] -fcompare-debug failure (length) with -Og -fcse-follow-jumps -fnon-call-exceptions

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95080

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.2

[Bug target/95078] Missing fwprop for SIB address

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95078

--- Comment #1 from Richard Biener  ---
TER should go away, not be extended.  So you are suggesting that we replace

leaq44(%rdi,%rdx,4), %rdx  --- redundant could be fwprop
movl(%rdx), %eax
movl$3, (%rsi)
addl(%rdx), %eax

with

movl   44(%rdi,%rdx,4), %eax
movl$3, (%rsi)
addl   44(%rdi,%rdx,4), %eax

?  The variant that looks bigger is actually one byte smaller.  Note as
soon as there are three uses it will be larger again...

So this is really something for RTL and yeah, fwprop only makes "local"
decisions.  Note that I think that your proposed variant will consume
more resources since the complex addressing modes are likely split into
a separate uop.  Yes, overall I'd expect less latency for your sequence.

[Bug debug/95077] Wrong backtrace infromation at O1

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95077

Richard Biener  changed:

   What|Removed |Added

  Known to fail||9.3.1
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-05-12

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug target/95076] Failure to tail-call on function call of different return type

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95076

Richard Biener  changed:

   What|Removed |Added

Summary|Failure to optimize out |Failure to tail-call on
   |stack alignment on function |function call of different
   |call of different type on   |return type
   |x86 |
 CC||hjl.tools at gmail dot com
 Target||x86_64-*-* i?86-*-*

--- Comment #1 from Richard Biener  ---
GCC doesn't tail-call because the return types are not compatible.  With a call
it cannot optimize the stack adjustment because of the ABI.

Note I'm not sure whether the ABI allows %rax to contain "garbage" in the
upper half for a function returning in %eax.  So what LLVM does may be wrong.

[Bug tree-optimization/57359] store motion causes wrong code for union access at -O3

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57359
Bug 57359 depends on bug 94988, which changed state.

Bug 94988 Summary: [11 Regression] FAIL: gcc.target/i386/pr64110.c 
scan-assembler vmovd[\\t ]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/94988] [11 Regression] FAIL: gcc.target/i386/pr64110.c scan-assembler vmovd[\\t ]

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94988

Richard Biener  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Richard Biener  ---
Fixed.

[Bug tree-optimization/95058] [11 regression] vector test case failures starting with r11-205

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95058

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2020-05-12

--- Comment #6 from Richard Biener  ---
OK, so for non 7 BE we end up not vectorizing because it doesn't look
profitable
which IMHO is good.  It would be nice to also see dumps before the respective
rev. because in theory (well...) the cost computation should be the same.
Ah!  OK, so we now have

0x10002001470 _1 1 times vec_construct costs 2 in prologue
0x10002001470 _1 1 times vec_construct costs 2 in prologue
0x10002001470 _1 2 times vector_store costs 2 in body
0x10001ecfcc0 _1 1 times scalar_store costs 1 in body
0x10001ecfcc0 _2 1 times scalar_store costs 1 in body
0x10001ecfcc0 _3 1 times scalar_store costs 1 in body
0x10001ecfcc0 _4 1 times scalar_store costs 1 in body

that is, the SLP graph has the expected cost.  Originally we likely
had costed against 4 scalar stores and 4 scalar loads (but the scalar
loads will still be there).  On x86_64 we get

0x3975280 _1 1 times vec_construct costs 8 in prologue
0x3975280 _1 1 times vec_construct costs 8 in prologue
0x3975280 _1 2 times vector_store costs 24 in body
0x3942cb0 _1 1 times scalar_store costs 12 in body
0x3942cb0 _2 1 times scalar_store costs 12 in body
0x3942cb0 _3 1 times scalar_store costs 12 in body
0x3942cb0 _4 1 times scalar_store costs 12 in body

so it's still profitable there.

Note I suggest to leave the FAILs in place for now since in my dev tree
I see the vec_construct gone again so it would start passing again
on ppc as well.

Sorry for the intermediate breakage.

[Bug target/95083] New: x86 fp_movcc expansion depends on real_cst sharing

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95083

Bug ID: 95083
   Summary: x86 fp_movcc expansion depends on real_cst sharing
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

I see gcc.target/i386/avxfp-1.c FAILing, which is

double x;
void
t()
{
  x=x>5?x:5;
}

double x;
void
q()
{
  x=x<5?x:5;
}

and q() recognized as FP min by ix86_expand_fp_movcc because the doesn't
pass prepare_cmp_insn () and later ifcvt matches up the originally
distinct pseudos for the two mentions of '5'.  For t() prepare_cmp_insn ()
succeeeds and ix86_expand_fp_movcc expands this to a UNSPEC_BLEND
(because the two mentions of '5' get a different pseudo so this doesn't
look like a max).  The first prepare_cmp_insn fails because it is fed

(lt (reg:DF 82 [ x.3_1 ])
(const_double:DF 5.0e+0 [0x0.ap+3]))

and appearantly we cannot do a lt compare(?) (but later during ifcvt we can).

Note the above is when expanding from a COND_EXPR, thus

t ()
{
  double x.1_1;
  double iftmp.0_3;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  x.1_1 = x;
  iftmp.0_3 = x.1_1 > 5.0e+0 ? x.1_1 : 5.0e+0;
  x = iftmp.0_3;
  return;

and

q ()
{
  double x.3_1;
  double iftmp.2_3;

;;   basic block 2, loop depth 0
;;pred:   ENTRY
  x.3_1 = x;
  iftmp.2_3 = x.3_1 < 5.0e+0 ? x.3_1 : 5.0e+0;
  x = iftmp.2_3;
  return;

similar FAILs occur for

FAIL: gcc.target/i386/avxfp-1.c scan-assembler vmaxsd
FAIL: gcc.target/i386/avxfp-2.c scan-assembler vminsd
FAIL: gcc.target/i386/ssefp-1.c scan-assembler maxsd
FAIL: gcc.target/i386/ssefp-2.c scan-assembler minsd

So what's missing is simplification of 

Trying 8 -> 9:
8: r87:DF=r85:DF

[Bug target/95083] x86 fp_movcc expansion depends on real_cst sharing

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95083

Richard Biener  changed:

   What|Removed |Added

Version|10.0|11.0
   Keywords||missed-optimization
 CC||uros at gcc dot gnu.org
 Target||x86_64-*-* i?86-*-*

--- Comment #1 from Richard Biener  ---
Needs https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545588.html to
reproduce.

[Bug tree-optimization/95084] New: code sinking prevents if-conversion

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95084

Bug ID: 95084
   Summary: code sinking prevents if-conversion
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

There's a pass ordering issue between the sink pass and tree-if-conv, if
conversion for vectorization.  When sink sinks a possibly trapping operation
to a place that is only conditionally executed if-conversion fails which
results in failed vectorization.  This can be seen with
https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545588.html applied
for gcc.dg/vect/pr56541.c (and it's ifcvt counterpart
gcc.dg/tree-ssa/ifc-pr56541.c).  But I've also seen this in other context.

Here

  iftmp.2_17 = rR_19 < rL_20 ? rR_19 : rL_20;
  iftmp.3_3 = rR_19 < rL_20 ? rL_20 : rR_19;
  if (iftmp.3_3 > 0.0)
goto ; [INV]
  else
goto ; [INV]

   :

   :
  # iftmp.4_14 = PHI 
  if (iftmp.4_14 > 0.0)

becomes

  iftmp.3_3 = rR_17 < rL_18 ? rL_18 : rR_17;
  if (iftmp.3_3 > 0.0)
goto ; [59.00%]
  else
goto ; [41.00%]

   [local count: 435831803]:
  goto ; [100.00%]

   [local count: 627172605]:
  iftmp.2_15 = rR_17 < rL_18 ? rR_17 : rL_18;
  if (iftmp.2_15 > 0.0)

and the now conditionally executed FP comparison can trap.

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #28 from Richard Biener  ---
> It the growth limit seems could be refined. The ^ is an exponent operation,
> right?

Yes.  The idea is to limit growth more when there is no benefit of unrolling
detected by the cost model (which currently simply counts likely eliminated
stmts).

(In reply to Jiu Fu Guo from comment #27)
> (In reply to Jiu Fu Guo from comment #26)
> > (In reply to Richard Biener from comment #20)
> > > (In reply to Jiu Fu Guo from comment #18)
> > > > Currently, I'm thinking to enhance GCC 'cunroll' as:
> > > > if the loop has multi-exits or upbound is not a fixed number, we may 
> > > > not do
> > > > 'complete unroll' for the loop, except -funroll-all-loops is specified.
> > > 
> > > That doens't make much sense (-funroll-all-loops is RTL unroller only).
> > 
> 
> For the loop which has multi-exits, it may not helpful to unroll it,
> especially "complete unroll" may be not helpful. Like loop in in_pack_i4.c.
> Since it would early exit, some iterations(may most iterations) were not
> executed.
> 
> Is it a good idea to disable the GIMPLE cunroll for this kind of loop? RTL
> unroll_stupid does not unroll this kind of loop either.

Well, GIMPLE cunroll specifically handles the situation of peeling such loops
and has a separate --param to control how many extra branches it may introduce
for those exits.  Generally disabling unrolling of such loops isn't a good
idea,
the reason for completely unrolling loops is abstraction removal and not
necessarily producing more optimal loop kernels (the loop is gone afterwards).

One of my TODO items is to work on its costing model to the extent that
we run value-numbering on the unrolled body (that's already done) and
roll back the unrolling if there wasn't any visible benefit.  The difficult
cases are like those in SPEC calculix where for full benefit you need to
unroll the 5(!) innermost loops and to even see any benefit you need to
unroll the 3 innermost loops.

[Bug tree-optimization/95097] Missed optimization with bitfield value ranges

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95097

--- Comment #3 from Richard Biener  ---
Just to quote EVRP sees

   :
  _1 = VIEW_CONVERT_EXPR(f);
  _2 = _1 & 1048575;
  if (_2 != 0)
goto ; [INV]
  else
goto ; [INV]

   :
  _3 = f.x;
  _4 = (unsigned int) _3;
  y_8 = _4 * 4096;
  if (y_8 <= 199)

thus the f.x != 0 test has been folded by one of those $?%&! permature
fold-const transforms to

  if ((BIT_FIELD_REF  & 1048575) != 0)

the fix is to get rid of those (and fix the "fallout").

[Bug debug/95098] Out of scope variable visible during debugging at Og

2020-05-12 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95098

Richard Biener  changed:

   What|Removed |Added

 CC||aoliva at gcc dot gnu.org,
   ||edlinger at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
Don't see this with gdb:

(gdb) start
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Temporary breakpoint 4 at 0x4004bd: file z.c, line 11.
Starting program: /home/rguenther/obj/gcc/a.out 

Temporary breakpoint 4, main () at z.c:11
11  int main() { b(); }
(gdb) s

Breakpoint 3, b () at z.c:4
4   for (g_2 = 21; (g_2 < (-27)); g_2 = 0)
(gdb) p l_9
No symbol "l_9" in current context.
(gdb) info locals
l_10 = 

note there _is_ l_9 in the DWARF, even with a location:

 <2>: Abbrev Number: 8 (DW_TAG_lexical_block)
   DW_AT_low_pc  : 0xa
   DW_AT_high_pc : 0x0
 <3>: Abbrev Number: 9 (DW_TAG_variable)
   DW_AT_name: l_9
   DW_AT_decl_file   : 1
   DW_AT_decl_line   : 7
   DW_AT_decl_column : 7
   DW_AT_type: <0xeb>
   DW_AT_location: 10 byte block: 3 0 0 0 0 0 0 0 0 9f 
(DW_OP_addr: 0; DW_OP_stack_value)

but

 :
   0:   c7 05 00 00 00 00 15movl   $0x15,0x0(%rip)# a 
   7:   00 00 00 
   a:   c3  retq   

and certainly the DW_AT_high_pc of the lexical block looks "odd" - the
block is not existent.  Assembly:

b:
.LFB0:
.file 1 "z.c"
.loc 1 2 9 view -0
.cfi_startproc
.loc 1 3 5 view .LVU1
.loc 1 4 5 view .LVU2
.loc 1 4 14 is_stmt 0 view .LVU3
movl$21, g_2(%rip)
.loc 1 4 20 is_stmt 1 view .LVU4
.LBB2:
.loc 1 7 2 view .LVU5
.LVL0:
.loc 1 8 2 view .LVU6
.LBE2:
.loc 1 10 1 is_stmt 0 view .LVU7
ret

so you can see .LBB2 to .LBE2 do not contain any actual instructions.
GIMPLE we expand from:

 b ()
{
   [local count: 1073741824]:
  [z.c:3:5] # DEBUG BEGIN_STMT
  [z.c:4:5] # DEBUG BEGIN_STMT
  [z.c:4:14] g_2 = 21;
  [z.c:4:20] # DEBUG BEGIN_STMT
  [z.c:7:2] # DEBUG BEGIN_STMT
  [z.c:7:7] # DEBUG l_9 => [z.c:7:13] &a
  [z.c:8:2] # DEBUG BEGIN_STMT
  [z.c:8:2] return;

does lldb try to interpret location views yet?  I suppose it might get
confused about the is_stmt 0 on the movl and only stop at ret
even though the "last" location on that is line 10 (but is_stmt 0 again).

It's difficult to produce a meaningful line-number program for the
resulting assembler ;)

[Bug tree-optimization/92177] [10 Regression] gcc.dg/vect/bb-slp-22.c FAILs

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92177

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #12 from Richard Biener  ---
.

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #30 from Richard Biener  ---
(In reply to Thomas Koenig from comment #29)
> It is also interesting that this variant
> 
> --- a/libgfortran/generated/in_pack_i4.c
> +++ b/libgfortran/generated/in_pack_i4.c
> @@ -88,7 +88,7 @@ internal_pack_4 (gfc_array_i4 * source)
>count[0]++;
>/* Advance to the next source element.  */
>index_type n = 0;
> -  while (count[n] == extent[n])
> +  while (n < dim && count[n] == extent[n])
>  {
>/* When we get to the end of a dimension, reset it and increment
>   the next dimension.  */
> @@ -100,7 +100,6 @@ internal_pack_4 (gfc_array_i4 * source)
>if (n == dim)
>  {
>src = NULL;
> -  break;
>  }
>else
>  {
> 
> does not get peeled.

More optimal would be

count[0]--;
>/* Advance to the next source element.  */
>index_type n = 0;
while (count[n] == 0)
  {
...
  }

note completely peeling the inner loop isn't as bad as it looks, it's
basically making the whole loop

  while (1)
{
  for (count[0] = 0; count[0] < extent[0]; ++count[0])
{
  /* Copy the data.  */
  *(dest++) = *src;
  /* Advance to the next element.  */
  src += stride0;
}
  if (dim == 1)
break;
  count[0] = 0;
  src -= stride[0] * extent[0];
  count[1]++;
  if (count[1] != extent[1])
continue;
  if (dim == 2)
break;
  count[1] = 0;
  src -= stride[1] * extent[1];
  count[2]++;
  if (count[2] != extent[2])
continue;
  if (dim == 3)
break;
...
}

which should be quite optimal for speed (branch-prediction wise).  One
might want to try to optimize code size a bit, sure.  Sacrifying a bit
of speed at the loop exit could be setting extent[n > dim] = 1 and
dropping the if (dim == N) break; checks, leaving just the last.
Likewise changing the iteration from extent[N] to zero might make
the tests smaller.  Then as commented in the code pre-computing the
products might help as well - you get one additional load of course.
Interleaving extent and the product data arrays would help cache
locality.

Note writing the loop as above will make GCC recognize it as a loop
nest.

[Bug rtl-optimization/95102] New: missed if-conversion

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95102

Bug ID: 95102
   Summary: missed if-conversion
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

If you rewrite gcc.target/i386/pr54855-9.c to a form GIMPLE looks like after
some PRE you end up with

typedef float vec __attribute__((vector_size(16)));

vec
foo (vec x, float a)
{
  if (!(x[0] < a))
x[0] = a;
  return x;
}

which is no longer recognized as the same and emits

foo:
.LFB0:
.cfi_startproc
comiss  %xmm0, %xmm1
ja  .L2
movss   %xmm1, %xmm0
.L2:
ret

instead of

foo:
.LFB1:  
.cfi_startproc
minss   %xmm1, %xmm0
ret

this is because RTL if-conversion does not recognize

7: r86:SF=vec_select(r84:V4SF,parallel)
8: flags:CCFP=cmp(r85:SF,r86:SF)
  REG_DEAD r86:SF
9: pc={(flags:CCFP>0)?L14:pc}
  REG_DEAD flags:CCFP
  REG_BR_PROB 536870916

   10: NOTE_INSN_BASIC_BLOCK 3
   12: r84:V4SF=vec_merge(vec_duplicate(r85:SF),r84:V4SF,0x1)
  REG_DEAD r85:SF

   14: L14:
   15: NOTE_INSN_BASIC_BLOCK 4
   20: xmm0:V4SF=r84:V4SF

the form it does recognize is

8: r82:SF=vec_select(r84:V4SF,parallel)
9: flags:CCFP=cmp(r85:SF,r82:SF)
   10: pc={(flags:CCFP>0)?L28:pc}
  REG_DEAD flags:CCFP
  REG_BR_PROB 536870916

   28: L28:
   14: NOTE_INSN_BASIC_BLOCK 3
5: r85:SF=r82:SF
  REG_DEAD r82:SF

   15: L15:
   16: NOTE_INSN_BASIC_BLOCK 4
   18: r87:V4SF=vec_merge(vec_duplicate(r85:SF),r84:V4SF,0x1)

[Bug rtl-optimization/95102] missed if-conversion

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95102

--- Comment #1 from Richard Biener  ---
OK, so one reason is that

  if (!can_conditionally_move_p (x_mode))
return FALSE;

returns false for E_V4SFmode on x86.  min/max detection is based
on fp_cmov expansion for scalar FP on x86 though (with its own
problems, see PR95083).

[Bug target/95018] [10/11 Regression] Excessive unrolling for Fortran library array handling

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95018

--- Comment #32 from Richard Biener  ---
Note I don't think the unrolling is excessive - store motion then applying
to all count[] and all computations hoisted out of the loop may be a bit
too much for register pressure though, especially since we're using
flag-based store-motion.  But it causes the stores to be materialized
on all exits of the loop which means we end up with N*N conditional stores :/

I guess SM could be improved here.

[Bug c++/95103] Unexpected -Wclobbered in bits/vector.tcc with -O2

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95103

Richard Biener  changed:

   What|Removed |Added

Version|unknown |10.1.0
   Keywords||diagnostic

--- Comment #1 from Richard Biener  ---
Likely because of the std::vector DTOR invocation which has to access
'v' which is not declared volatile but still "live" across the setjmp.

Does it work placing the initial part of the function in a separate { }?

[Bug testsuite/95110] new test case in r11-345 error: gcc.dg/tree-ssa/pr94969.c: dump file does not exist

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95110

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Richard Biener  ---
Fixed.

[Bug fortran/95109] [11 regression] ICE in gfortran.dg/gomp/target1.f90 after r11-349

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95109

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |11.0

[Bug target/95112] i686 procedures have prolog endbr32

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95112

--- Comment #1 from Richard Biener  ---
Try -fcf-protection=none

[Bug tree-optimization/95113] [10/11 Regression] Wrong code w/ -O2 -fexceptions -fnon-call-exceptions

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95113

Richard Biener  changed:

   What|Removed |Added

 Blocks||93385
   Priority|P3  |P2


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93385
[Bug 93385] [10/11 Regression] wrong code with u128 modulo at -O2 -fno-dce
-fno-ipa-cp -fno-tree-dce

[Bug middle-end/95108] [9/10/11 Regression] ICE in tree_fits_uhwi_p, at tree.c:7292

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95108

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug fortran/95107] [10/11 Regression] ICE in hash_operand, at fold-const.c:3768

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95107

Richard Biener  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org
   Priority|P3  |P2
   Target Milestone|--- |10.2

[Bug middle-end/95115] RISC-V 64: inf/inf division optimized out, invalid operation not raised

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code
 Target|riscv64-unknown-linux-gnu   |
Summary|[10 Regression] RISC-V 64:  |RISC-V 64: inf/inf division
   |inf/inf division optimized  |optimized out, invalid
   |out, invalid operation not  |operation not raised
   |raised  |
   Last reconfirmed||2020-05-14
 Ever confirmed|0   |1
  Component|target  |middle-end
  Build|riscv64-unknown-linux-gnu   |
   Host|riscv64-unknown-linux-gnu   |
 Status|UNCONFIRMED |NEW

--- Comment #6 from Richard Biener  ---
(simplify
 (rdiv @0 @0)
 (if (FLOAT_TYPE_P (type)
  && ! HONOR_NANS (type)
  && ! HONOR_INFINITIES (type))
  { build_one_cst (type); }))

so that's not it, possibly constant folding instead in const_binop.
There we only have

1276  /* Don't perform operation if we honor signaling NaNs and
1277 either operand is a signaling NaN.  */
1278  if (HONOR_SNANS (mode)
1279  && (REAL_VALUE_ISSIGNALING_NAN (d1)
1280  || REAL_VALUE_ISSIGNALING_NAN (d2)))
1281return NULL_TREE;

and

1283  /* Don't perform operation if it would raise a division
(gdb) 
1284 by zero exception.  */
1285  if (code == RDIV_EXPR
1286  && real_equal (&d2, &dconst0)
1287  && (flag_trapping_math || ! MODE_HAS_INFINITIES (mode)))
1288return NULL_TREE;

which both don't trigger.  Afterwards

1309  inexact = real_arithmetic (&value, code, &d1, &d2);

even returns false and the result is a qNaN.

For the specific regression in this bug we now simply are able to
turn

return u.x/v.x;

into a division of two constants.  That's nothing we're going to "fix",
so we have to fix the above instead which is a much older issue.

[Bug tree-optimization/95118] [10/11 Regression] gcc-10 and master -O3 -fopt-info-vec causes gcc to hang

2020-05-13 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95118

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2
   Keywords||compile-time-hog
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED
   Target Milestone|--- |10.2
Summary|gcc-10 and master -O3   |[10/11 Regression] gcc-10
   |-fopt-info-vec causes gcc   |and master -O3
   |to hang |-fopt-info-vec causes gcc
   ||to hang
 Ever confirmed|0   |1
   Last reconfirmed||2020-05-14
  Known to work||9.3.0

--- Comment #5 from Richard Biener  ---
On the GCC 10 branch I see it not returning from

(gdb) fin
Run till exit from #0  0x0107148d in real_to_decimal_for_mode (
str=0x7fffcd60 "\200", r_orig=0x7fffcd40, buf_size=100, digits=57, 
crop_trailing_zeros=1, mode=E_VOIDmode)
at /space/rguenther/src/gcc-10-branch/gcc/real.c:1718

we're in this loop:

  while (1)
{
  /* Stop if we'd shift bits off the bottom.  */
  if (v.sig[0] & 7)
break;

  do_multiply (&u, &v, ten);

  /* Stop if we're now >= 1.  */
  if (REAL_EXP (&u) > 0)
break;

  v = u;
  dec_exp -= 1;
}

(gdb) p u
$1 = {cl = 0, decimal = 0, sign = 0, signalling = 0, canonical = 0, uexp = 0, 
  sig = {0, 0, 0}}

and the original REAL_VALUE_TYPE is

(gdb) p *r_orig
$4 = {cl = 1, decimal = 0, sign = 0, signalling = 0, canonical = 0, 
  uexp = 67092486, sig = {0, 0, 0}}

so it's simply a weird not normalized constant zero ...

I have a patch to paper over this in real_to_decimal_for_mode which then
prints

  a (0.0e-8191);

for the specific number.

[Bug bootstrap/95122] Cross-compile arm32 toolchain with hard float, but Error in gcc final

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95122

Richard Biener  changed:

   What|Removed |Added

 Target||arm-linux-gnueabihf
 Ever confirmed|0   |1
 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2020-05-14

--- Comment #1 from Richard Biener  ---
You seem to build from inside the source directory, that is not supported. 
Please create a separate object directory like

mkdir obj
cd obj
../configure 

and re-try.

[Bug tree-optimization/95118] [10 Regression] gcc-10 and master -O3 -fopt-info-vec causes gcc to hang

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95118

Richard Biener  changed:

   What|Removed |Added

  Known to work||11.0
Summary|[10/11 Regression] gcc-10   |[10 Regression] gcc-10 and
   |and master -O3  |master -O3 -fopt-info-vec
   |-fopt-info-vec causes gcc   |causes gcc to hang
   |to hang |
  Known to fail||10.1.0

--- Comment #7 from Richard Biener  ---
Fixed on trunk sofar.

[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703

--- Comment #12 from Richard Biener  ---
(In reply to pskocik from comment #11)
> Thanks for the shot at a fix, Richard Biener.
> 
> Since I have reported this, I think I should mentioned a related
> suboptimality that should probably be getting fixed alongside with this (if
> this one is getting fixed), namely that while
> 
> 
> int64_t zextend_int_to_int64_nospill(int *X) 
> { 
> union { int64_t _; } r = {0}; return memcpy(&r._,X,sizeof(*X)),r._;
> }
> 
> (and hopefully later even 
> 
> int64_t zextend_int_to_int64_spill(int *X) { int64_t r = {0}; return
> memcpy(&r,X,sizeof(*X)),r; }
> )
> 
> generates, on x86_64, the optimal
> 
> zextend_int_to_int64_nospill:
> mov eax, DWORD PTR [rdi]
> ret
> 
> for zeroextending promotions of sub-int types, an extra xor instruction gets
> generated, e.g.:
> 
> 
> int64_t zextend_short_to_int64_nospill_but_suboptimal(short *X) 
> {
> union { int64_t _; } r ={0}; return memcpy(&r._,X,sizeof(*X)),r._;
> }
> 
> =>
> 
> zextend_short_to_int64_nospill_but_suboptimal:
> xor eax, eax
> mov ax, WORD PTR [rdi]
> ret
> 
> which was surprising to me because it doesn't happen with zero-extending
> memcpy-based promotion from {,u}ints to larger types ({,u}{,l}longs).
> 
> https://gcc.godbolt.org/z/ZjXaCw

I think this is PR93507 for which I have a patch queued as well.

[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703

--- Comment #13 from Richard Biener  ---
(In reply to r...@cebitec.uni-bielefeld.de from comment #10)
> > --- Comment #9 from Richard Biener  ---
> [...]
> > Hmm, OK looks like memcpy is not folded, likely because the source is
> > not known to be appropriately aligned.
> [...]
> > should fix this.  Can you verify and if so, commit?  Thx.
> 
> Unfortunately, it doesn't.

OK, this only helps a bit later since CCP is required to propagate the
alignment, the following forwprop pass to elide the memcpy and then
finally the update-address-taken invocation in the _second_ CCP pass
after inlining will have

pr94703.c.093t.ccp2:No longer having address taken: r

I've long pondered to remove the memcpy folding restriction for strict-align
targets but never went through.

I'll update the testcase to require

/* { dg-require-effective-target non_strict_align } */

[Bug middle-end/94703] Small-sized memcpy leading to unnecessary register spillage unless done through a dummy union

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94703

Richard Biener  changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Richard Biener  ---
Fixed.

[Bug target/94087] std::random_device often fails when used from multiple threads

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2020-05-14

--- Comment #10 from Richard Biener  ---
So it looks like the rdseed usage is new in GCC 10 libstdc++ and it prevails
over the previous rdrand support if supported on your CPU.

I can reproduce this on a CPU with rdseed support and libstdc++ from GCC 10.

The code invoked looks correct to me:

  20:   83 e8 01sub$0x1,%eax
  23:   74 12   je 37
<_ZNSt12_GLOBAL__N_112__x86_rdseedEPv+
0x37>
  25:   f3 90   pause  
  27:   0f c7 fardseed %edx
  2a:   89 11   mov%edx,(%rcx)
  2c:   73 f2   jae20
<_ZNSt12_GLOBAL__N_112__x86_rdseedEPv+
0x20>

the number of tries libstdc++ does is 100.  Note rdrand doesn't exhibit this
issue.

So it might very well be a hardware limitation.  Btw, the reproducer can be
"enhanced" by providing the method of operation:

std::random_device rd("rdseed");

that makes sure it will fail in a different way on a not capable CPU
(Intel Broadwell or later or AMD Zen).

[Bug target/94087] std::random_device often fails when used from multiple threads

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94087

Richard Biener  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com,
   ||redi at gcc dot gnu.org

--- Comment #11 from Richard Biener  ---
HJ, is what libstdc++ does "unreasonable" (it uses rdseed by default if
available) and could it do better?  Can you reproduce the issue?
The docs quoted by Andrew suggest that libstdc++ should, when retries
are not enough, fall back to another method.

[Bug c/95126] Missed opportunity to turn static variables into immediates

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95126

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
 Ever confirmed|0   |1
   Last reconfirmed||2020-05-14

--- Comment #1 from Richard Biener  ---
Confirmed.  Only RTL expansion sees the aggregate copy involved with the
function
call and this, when folded from a constant initializer, is not subject to
clever things such as merging of stores.

[Bug target/95125] Unoptimal code for vectorized conversions

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95125

Richard Biener  changed:

   What|Removed |Added

Version|unknown |11.0
 Ever confirmed|0   |1
   Last reconfirmed||2020-05-14
 Target||x86_64-*-* i?86-*-*
   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
ISTR I filed a duplicate 10 years ago or so.  The issue is the vectorizer
could not handle V4DFmode -> V4SFmode conversions.

Could, because for SVE we added the capability but this requires
additional instruction patterns (IIRC I filed a but about this last
year).  Yep.  PR92658 it is.

[Bug rtl-optimization/95123] [10/11 Regression] Wrong code w/ -O2 -fselective-scheduling2 -funroll-loops --param early-inlining-insns=5 --param loop-invariant-max-bbs-in-loop=3 --param max-jump-thread

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95123

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |10.2

[Bug pch/95131] Instantiate templates at pch generation time

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95131

Richard Biener  changed:

   What|Removed |Added

 CC||nathan at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
Modules are the future, not sure how this applies there.

[Bug rtl-optimization/11832] Optimization of common stores in switch statements

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11832
Bug 11832 depends on bug 33315, which changed state.

Bug 33315 Summary: stores not commoned by sinking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/33315] stores not commoned by sinking

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Richard Biener  ---
Fixed on trunk.  Individual missed cases should be tracked by separate
bugreports.

[Bug other/16996] [meta-bug] code size improvements

2020-05-14 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16996
Bug 16996 depends on bug 33315, which changed state.

Bug 33315 Summary: stores not commoned by sinking
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=33315

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

<    2   3   4   5   6   7   8   9   10   11   >