[Bug lto/115432] Building a program with -flto generates wrong code (missing the call to a function) unless -fno-strict-aliasing

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115432

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Richard Biener  ---
struct file_output_stream
{

union
{
void *voidp;
int fd;
} data;

const output_stream_vtbl* vtbl;
};

struct output_stream
{
void* data;
const output_stream_vtbl* vtbl;
};

those are two unrelated types.  Doing

 ((file_output_stream *)p)->vtbl = x;
 ... = ((output_stream *)p)->vtbl;

is invoking undefined behavior (unless -fno-strict-aliasing).

[Bug lto/115432] Building a program with -flto generates wrong code (missing the call to a function) unless -fno-strict-aliasing

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115432

--- Comment #1 from Richard Biener  ---
In case output_stream is not the same or derived from file_output_stream
or contains a file_output_stream object as first member you invoke undefined
behavior when the calls following might read from the object via output_stream
or another alltogether different type (buffer_output_stream?).

[Bug tree-optimization/115426] ICE: in execute_todo, at passes.cc:2138

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115426

--- Comment #3 from Richard Biener  ---
I think this is a gimplification failure.  'r' is neither TREE_ADDRESSABLE
nor DECL_NOT_GIMPLE_REG and the =X constraint results in both allow_reg
and allow_mem but we gimplify it as is_gimple_lvalue which should,
as the base is a gimple register, emit a component extract to pre_p and
a complex build to post_p.

gimplify_compound_lval correctly sees this and forces a register argument
to the __imag operation but I'm not sure that's enough for lvalues.
IIRC a simple

 __imag x = 1;

also doesn't have DECL_NOT_GIMPLE_REG on 'x', and gimplify_compound_lval
behaves the same.  Still we eventually gimplify to

  _1 = REALPART_EXPR ;
  x = COMPLEX_EXPR <_1, 1.0e+0>;
  D.2772 = x;

which is done via gimplify_modify_expr_complex_part.  That suggests
it's gimplify_asm_expr that would need to do this very same thing as we
seem to rely on this for correctness.

With "=r" we correctly gimplify to

  __asm__("" : "=r" D.2772);
  _1 = REALPART_EXPR ;
  r = COMPLEX_EXPR <_1, D.2772>;
  D.2773 = r;

[Bug tree-optimization/115426] ICE: in execute_todo, at passes.cc:2138

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115426

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from Richard Biener  ---
Mine (into-SSA is broken it seems)

[Bug tree-optimization/115423] Inlined strcmp optimizes poorly

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115423

--- Comment #2 from Richard Biener  ---
You could also say rtl-optimization does a bad job with the inlined version.
Or we should inline strcmp on GIMPLE to get the first char optimized.

Consider

 strcmp (c, "ABCDEFGHabcdefgh")
 || strcmp (c, "ABCDEFGHfoobar")

thus strings with a common prefix which we could optimize as

 strncmp (c, "ABCDEFGH", 8)
 && (strcmp (c+8, "abcdefgh")
 || strcmp (c+8, "foobar"))

as a more general transform.

I should say inline_string_cmp should consider using larger unaligned
reads as well.

[Bug libstdc++/58909] C++11's condition variables fail with static linking

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58909

Richard Biener  changed:

   What|Removed |Added

 CC||ilg at livius dot net

--- Comment #28 from Richard Biener  ---
*** Bug 115421 has been marked as a duplicate of this bug. ***

[Bug libstdc++/115421] Multi-threaded condition_variable app throws when linking as -static on Linux

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115421

Richard Biener  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #16 from Richard Biener  ---
dup then

*** This bug has been marked as a duplicate of bug 58909 ***

[Bug bootstrap/115416] [13/14/15 regression] Setting --includedir to a nonexistent directory causes a build error since r13-5490-g59e4c98173a79f

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115416

Richard Biener  changed:

   What|Removed |Added

Version|unknown |14.1.0
   Target Milestone|--- |13.4

[Bug tree-optimization/115427] fallback for interclass mathfn bifs like isinf, isfinite, isnormal

2024-06-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115427

--- Comment #2 from Richard Biener  ---
The canonical way would be to handle these in the ISEL pass and remove
the (fallback) expansion.  But then we can see whether the expander FAILs
(ideally expanders would never be allowed to FAIL, and for FAILing expanders
we'd have a way to query the target like we have the vec_perm_const hook).

But I'll note that currently the expanders may FAIL but then we expand to
a call rather than the inline-expansion (and for example AVR relies on this
now to avoid early folding of isnan).

So - for the cases of isfininte and friends without a fallback call I
would suggest to expand from ISEL to see if it FAILs and throw away
the result (similar as how IVOPTs probes things).  Or make those _not_
allowed to FAIL?  Why would they fail to expand anyway?

[Bug middle-end/115388] [15 Regression] wrong code at -O3 on x86_64-linux-gnu since r15-571-g1e0ae1f52741f7

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115388

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Richard Biener  ---
Fixed.  Unfortunately this didn't fix PR115256 if I checked correctly.  Keep
searching!

[Bug middle-end/115405] wrong code with _BitInt() sign-extension with -fno-strict-aliasing -O1 and above

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115405

--- Comment #3 from Richard Biener  ---
It's not visible but I assume that _4 doesn't have _BitInt(17) type?

The

if (known_eq (offset, 0)
&& !reverse
&& poly_int_tree_p (TYPE_SIZE (type), _size)
&& known_eq (GET_MODE_BITSIZE (DECL_MODE (base)), type_size))

check tries to assess that no extension is required, does it work if you
adjust that for the _BitInt case?

OTOH the reduce_bit_field handling in VIEW_CONVERT_EXPR expansion looks
misplaced - shouldn't it be before the INTEGRAL_TYPE_P handling?

[Bug tree-optimization/115395] [15 regression] libarchive miscompiled with -O2 -march=znver2 -fno-vect-cost-model since r15-1006-gd93353e6423eca

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115395

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #8 from Richard Biener  ---
Fixed.

[Bug middle-end/115388] [15 Regression] wrong code at -O3 on x86_64-linux-gnu since r15-571-g1e0ae1f52741f7

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115388

--- Comment #4 from Richard Biener  ---
It's DSE5 deleting

  Deleted dead store: a[b.19_216] = 1;

there's a big irreducible region following the loop with this store, but
I fail to see how we can reach the load without going through the other
redundant store.

Ah, wait - it's the same as with loops in irreducible regions and triggering
a latent issue.  We do

  /* If we visit this PHI by following a backedge then we
 have to make sure ref->ref only refers to SSA names
 that are invariant with respect to the loop
 represented by this PHI node.  */
  if (dominated_by_p (CDI_DOMINATORS, gimple_bb (stmt),
  gimple_bb (use_stmt))
  && !for_each_index (ref->ref ? >ref :
>base,
  check_name, gimple_bb
(use_stmt)))
return DSE_STORE_LIVE;

but we identify backedges by using dominators which only works for natural
loops and not irreducible regions.  We have to either disregard all refs in
irreducible regions or check for invariantness in the irreducible (sub-)region
spanned by the PHI and the backedge source.

I'm going to check the latter.

[Bug debug/115386] ice with -g -O3

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115386

--- Comment #8 from Richard Biener  ---
(In reply to David Binderman from comment #7)
> (In reply to Richard Biener from comment #6)
> > Are you using a compiler with release checking?  
> 
> No, with asan & ubsan. 
> 
> I tried running cc1 under gdb and got this backtrace:
> 
> #0  0x00b54615 in gt_ggc_mx_rtx_def (x_p=0x7fffe939bd00)
> at gtype-desc.cc:323
> #1  0x00b54829 in gt_ggc_mx_rtx_def (x_p=)
> at gtype-desc.cc:940
> #2  0x00b55405 in gt_ggc_mx_rtx_def (x_p=)
> at gtype-desc.cc:717
> #3  0x00b55405 in gt_ggc_mx_rtx_def (x_p=)
> at gtype-desc.cc:717
> #4  0x00b55405 in gt_ggc_mx_rtx_def (x_p=)
> at gtype-desc.cc:717
> #5  0x00b55405 in gt_ggc_mx_rtx_def (x_p=)
> at gtype-desc.cc:717
> 
> That continues on for a depth of more than 1000 frames.

Yes, the garbage collecting marking can be deeply recursive.  I guess
asan/ubsan cause the marker functions to consume more stack.

The issue can likely be reproduced even w/o asan/ubsan by lowering the
stack size though I'm not sure how much the frame size of
gt_ggc_mx_rtx_def explodes with asan/ubsan (or other functions in
gtype-desc.cc).  It might bake sense to exempt gtype-desc.cc from
asan/ubsan instrumentation.

Lowering the stack size to 1MB down from 8MB still doesn't make it
reproduce without UBSAN/ASAN ...

[Bug middle-end/115411] ICE : in expand_call, at calls.cc:3668

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115411

Richard Biener  changed:

   What|Removed |Added

  Component|c   |middle-end
   Keywords||ice-on-valid-code

--- Comment #1 from Richard Biener  ---
I think there's related bugs where error-recovery for the

/root/gdbtest/gcctest/gcc_llvm/gcc/z2.cc:5:5: sorry, unimplemented: passing too
large argument on stack
5 |   f (*x);
  |   ~~^~~~

error isn't fool-proof.

[Bug tree-optimization/115395] [15 regression] libarchive miscompiled with -O2 -march=znver2 -fno-vect-cost-model since r15-1006-gd93353e6423eca

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115395

--- Comment #6 from Richard Biener  ---
In fact, the main loop ends up not using SLP but the epilogue one does and
we end up setting STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT which we do not
support for SLP.

The question is whether to add that support or simply fail (but this is
code generation).  It's probably easiest to transitionally implement
support and rip it out again later.

[Bug tree-optimization/115395] [15 regression] libarchive miscompiled with -O2 -march=znver2 -fno-vect-cost-model since r15-1006-gd93353e6423eca

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115395

--- Comment #5 from Richard Biener  ---
It needs epilogue vectorization to trigger and it's the path re-using the
vector accumulator from the earlier loop that goes wrong when the main
vector loop is skipped.

We apply the initial value adjustment to the scalar result but the
continuation fails to do this and the epilogue vector epilogue expects
the earlier code to have done it.

IIRC we force "optimization" of this to be disabled but obviously somehow
fail to do this for SLP.

[Bug debug/115386] ice with -g -O3

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115386

Richard Biener  changed:

   What|Removed |Added

Version|unknown |15.0

--- Comment #6 from Richard Biener  ---
Are you using a compiler with release checking?  Stack overflow with the GGC
recursion might depend on not collecting too often as it would happen with
checking enabled.

I don't see expand taking much time on x86_64, most is IL verification
and if that's disabled sched2.

[Bug tree-optimization/115382] Wrong code with in-order conditional reduction and masked loops

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382

--- Comment #4 from Richard Biener  ---
(In reply to Robin Dapp from comment #3)
> For the record - the hunk before bootstrapped and regtested on the cfarm
> machines and tested successfully on aarch64 qemu with sve.  I still need to
> set up a regtest environment with SME.

I think the patch is OK, so I suggest to post it and CC Richard S. so he
can chime in.

[Bug target/115404] [15 Regression] possibly wrong code on glibc-2.39 since r15-1113-gde05e44b2ad963

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115404

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
 Target||x86_64-*-* i?86-*-*

[Bug tree-optimization/115395] [15 regression] libarchive miscompiled with -O2 -march=znver2 -fno-vect-cost-model since r15-1006-gd93353e6423eca

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115395

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #4 from Richard Biener  ---
Mine.

[Bug lto/115394] ICE in lto_read_decls for a minimal C test-case with streamer_debugging set to true

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115394

Richard Biener  changed:

   What|Removed |Added

   Keywords||internal-improvement

--- Comment #1 from Richard Biener  ---
I'm quite sure streamer_debugging was never updated after the rewrite a few
years ago.  I'd suggest to remove all traces of it, it's a very weak bit of
debugging it adds ontop existing consistency checks.

[Bug middle-end/115388] [15 Regression] wrong code at -O3 on x86_64-linux-gnu since r15-571-g1e0ae1f52741f7

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115388

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
Version|unknown |15.0

--- Comment #3 from Richard Biener  ---
Ah, finally a small testcase.  I'll have a look.

[Bug rtl-optimization/115384] [15 Regression] ICE: RTL check: expected code 'const_int', have 'const_wide_int' in simplify_binary_operation_1, at simplify-rtx.cc:4088 since r15-1047-g7876cde25cbd2f

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115384

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build since r15-1053-g28edeb1409a7b8

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Richard Biener  ---
Should be fixed now.

[Bug tree-optimization/115382] Wrong code with in-order conditional reduction and masked loops

2024-06-10 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382

--- Comment #2 from Richard Biener  ---
I think it should work, but there's also prepare_vec_mask which is using a
cache but I have no idea whether this is applicable for non-load/store and
whether there's extra work to be done for it to be usable.

Richard?

[Bug tree-optimization/115385] Peeling for gaps can be optimized more or needs to peel more than one iteration

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115385

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-06-07
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Mine.

[Bug tree-optimization/115385] New: Peeling for gaps can be optimized more or needs to peel more than one iteration

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115385

Bug ID: 115385
   Summary: Peeling for gaps can be optimized more or needs to
peel more than one iteration
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Consider

void __attribute__((noipa)) foo(unsigned char * __restrict x,
unsigned char *y, int n)
{
  for (int i = 0; i < n; ++i)
{
  x[16*i+0] = y[3*i+0];
  x[16*i+1] = y[3*i+1];
  x[16*i+2] = y[3*i+2];
  x[16*i+3] = y[3*i+0];
  x[16*i+4] = y[3*i+1];
  x[16*i+5] = y[3*i+2];
  x[16*i+6] = y[3*i+0];
  x[16*i+7] = y[3*i+1];
  x[16*i+8] = y[3*i+2];
  x[16*i+9] = y[3*i+0];
  x[16*i+10] = y[3*i+1];
  x[16*i+11] = y[3*i+2];
  x[16*i+12] = y[3*i+0];
  x[16*i+13] = y[3*i+1];
  x[16*i+14] = y[3*i+2];
  x[16*i+15] = y[3*i+0];
}
}

and

void __attribute__((noipa)) bar(unsigned char * __restrict x,
unsigned char *y, int n)
{
  for (int i = 0; i < n; ++i)
{
  x[16*i+0] = y[5*i+0];
  x[16*i+1] = y[5*i+1];
  x[16*i+2] = y[5*i+2];
  x[16*i+3] = y[5*i+3];
  x[16*i+4] = y[5*i+4];
  x[16*i+5] = y[5*i+0];
  x[16*i+6] = y[5*i+1];
  x[16*i+7] = y[5*i+2];
  x[16*i+8] = y[5*i+3];
  x[16*i+9] = y[5*i+4];
  x[16*i+10] = y[5*i+0];
  x[16*i+11] = y[5*i+1];
  x[16*i+12] = y[5*i+2];
  x[16*i+13] = y[5*i+3];
  x[16*i+14] = y[5*i+4];
  x[16*i+15] = y[5*i+0];
}
}

for both loops we currently cannot reduce the access for the load from 'y' to
not touch extra elements so we force peeling for gaps.  But in both cases
peeling a single scalar iteration is not sufficient and we get

t.c:5:21: note:   ==> examining statement: _3 = y[_1];
t.c:5:21: missed:   peeling for gaps insufficient for access
t.c:7:20: missed:   not vectorized: relevant stmt not supported: _3 = y[_1];

we can avoid this excessive peeling for gaps if we narrow the load from 'y'
to the next power-of-two size where then it's always sufficient to just
peel a single scalar iteration.  When the target cannot construct a vector
with those elements we'd have to peel more than one iteration.

[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build since r15-1053-g28edeb1409a7b8

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383

--- Comment #4 from Richard Biener  ---
Created attachment 58378
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58378=edit
patch

I'm testing this, but I do not have hardware to test correctness (and qemu not
set up).

[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build since r15-1053-g28edeb1409a7b8

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #3 from Richard Biener  ---
So we're now doing a EXTRACT_LAST_REDUCTION with multiple stmt copies which
is disallowed for non-SLP (by accident?).  It shows it of course doesn't
work since we end up removing the scalar reduction stmt multiple times.

   [local count: 860067202]:
  # j_12 = PHI 
  # i_14 = PHI 
  # vect_vec_iv_.9_45 = PHI <_46(8), _47(28)>
  _46 = vect_vec_iv_.9_45 + { 16, 16, 16, 16 };
  _48 = vect_vec_iv_.9_45 + { 4, 4, 4, 4 };
  _49 = _48 + { 4, 4, 4, 4 };
  _50 = _49 + { 4, 4, 4, 4 };
  vect__1.10_51 = (vector(4) float) vect_vec_iv_.9_45;
  vect__1.10_52 = (vector(4) float) _48;
  vect__1.10_53 = (vector(4) float) _49;
  vect__1.10_54 = (vector(4) float) _50;
  mask__3.11_55 = vect__1.10_51 < { 0.0, 0.0, 0.0, 0.0 };
  mask__3.11_56 = vect__1.10_52 < { 0.0, 0.0, 0.0, 0.0 };
  mask__3.11_57 = vect__1.10_53 < { 0.0, 0.0, 0.0, 0.0 };
  mask__3.11_58 = vect__1.10_54 < { 0.0, 0.0, 0.0, 0.0 };
  j_2 = .FOLD_EXTRACT_LAST (j_12, mask__3.11_55, vect_vec_iv_.9_45);

and we removed the old

  j_2 = _3 ? i_14 : j_12;

we are about to insert

  j_2 = .FOLD_EXTRACT_LAST (j_12, mask__3.11_56, _48);

I think correct would be

  j_59 = .FOLD_EXTRACT_LAST (j_12, mask__3.11_55, vect_vec_iv_.9_45);
  j_60 = .FOLD_EXTRACT_LAST (j_59, mask__3.11_56, _48);
  j_61 = .FOLD_EXTRACT_LAST (j_60, mask__3.11_57, _49);
  j_2 = .FOLD_EXTRACT_LAST (j_61, mask__3.11_58, _50);

I'm testing a patch.

[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Priority|P3  |P1

--- Comment #2 from Richard Biener  ---
I can reproduce.

[Bug tree-optimization/115382] New: Wrong code with in-order conditional reduction and masked loops

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115382

Bug ID: 115382
   Summary: Wrong code with in-order conditional reduction and
masked loops
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

vectorize_fold_left_reduction does

  if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in,
i);
  else if (is_cond_op)
mask = vec_opmask[i];

that doesn't work - both masks have to be combined.  This for example shows
in a runfail of gcc.dg/vect/vect-cond-reduc-in-order-2-signed-zero.c
with -march=cascadelake --param vect-partial-vector-usage=2 on x86_64.

The len-masking code looks good.

[Bug tree-optimization/115381] Missed deoptimization opportunity when comparing two different linker symbols

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115381

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-07
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||hubicka at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
Doesn't seem to help here.  Related testcase:

extern int x;
extern int y;

int z(){ return  ==  }

possibly -fno-semantic-interposition doesn't cover the definitions being
aliases of each other.  Defining TU:

int x(){}
int __attribute__((alias("x"))) y();

I believe this is wrong-code from clang.

[Bug tree-optimization/115381] Missed deoptimization opportunity when comparing two different linker symbols

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115381

--- Comment #1 from Richard Biener  ---
-fno-semantic-interposition

[Bug target/115373] [15 Regression] RISCV slp-cond-2-big-array.c slp-cond-2.c scan-tree-dump fails since r15-859-geaaa4b88038

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115373

Richard Biener  changed:

   What|Removed |Added

 Target|riscv   |riscv, aarch64

--- Comment #3 from Richard Biener  ---
Same on aarch64.

[Bug target/115375] [15 Regression] RISCV scan failures since 2024-05-04

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115375

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
 Target||riscv
   Keywords||testsuite-fail
   Target Milestone|--- |15.0

--- Comment #1 from Richard Biener  ---
Yes, I've seen these in the precommit CI, scan-assembler are notoriously
difficult to "adjust" and even analyze.  I left this to risc-v folks assuming
they are fine with this as Richard was fine doing the same for arm.

[Bug c/115374] fmod() in x86_64 -O3 not using return value from the glibc's implementation if x87 FPU fprem returns NaN

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115374

--- Comment #9 from Richard Biener  ---
Yep, it's call DCE which elides the errno setting function call iff the result
is not NaN.

[Bug target/115373] [15 Regression] RISCV slp-cond-2-big-array.c slp-cond-2.c scan-tree-dump fails since r15-859-geaaa4b88038

2024-06-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115373

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
 CC||rguenth at gcc dot gnu.org
 Target||riscv
   Keywords||testsuite-fail

--- Comment #2 from Richard Biener  ---
This also wasn't seen in precommit CI.  I can confirm it on trunk and the
issue is that we prefer load-lanes for f3 instead of SLP.

This issue will go away when we do load-lanes from SLP, so it's intermittent
(but I can't promise any timeline).  I wonder if the FAIL also occurs on
aarch64.  There's vect_load_lanes to eventually "fix" the FAIL by adjusting
the testcase expectation.

[Bug target/115372] [15 Regression] RISCV pr97428.c scan-tree-dump-times after r15-812-gc71886f2ca2

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115372

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
   Keywords||testsuite-fail
 CC||rguenth at gcc dot gnu.org
 Target||riscv

--- Comment #2 from Richard Biener  ---
I don't remember seeing FAIL: gcc.dg/vect/pr97428.c in the precommit CI, this
one should get one SLP instance and seeing zero means it now fails to SLP on
RISC-V.  With a cross and rv64gcv I don't see this failure (on top of trunk).
Ah, for me it's XFAILed because of ! vect_hw_misalign - do you use additional
flags?  But even adding -mno-strict-align doesn't help.

Oh, the dejagnu harness uses check_effective_target_riscv_v_misalign_ok
which _runs_ a testcase ... which of course fails for my simple cc1 cross
(w/o binutils and w/o qemu set up).  Is the precommit CI any better here?

[Bug target/115370] [15 regression] gcc.target/i386/pr77881.c FAIL

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115370

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
   Keywords||missed-optimization

[Bug other/115365] New test case gcc.dg/pr100927.c from r15-1022-gb05288d1f1e4b6 fails

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115365

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #3 from Richard Biener  ---
Fixed I assume.

[Bug c++/115364] [11/12/13/14/15 Regression] ICE-on-invalid when calling non-const template member on const object

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115364

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P4

[Bug tree-optimization/115363] Missing loop vectorization due to loop bound load not being pulled out

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115363

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-06

--- Comment #1 from Richard Biener  ---
Invariant motion doesn't do versioning for aliasing.  But in fact once the
loop iterates array[k] can no longer alias this->size but this is difficult
to exploit (peeling the loop once would help).

I'm not sure we should start to version all those loops where the exit
condition depends on a not hoistable but invariant expression?

But maybe we can diagnose this so people can rewrite their code.

[Bug target/115362] fixed_size_simd dot product recognition and sign of determinant not working for stdx::reduce

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115362

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-06
 Status|UNCONFIRMED |WAITING
 Ever confirmed|0   |1

--- Comment #12 from Richard Biener  ---
How should I compile this?

> /space/rguenther/install/gcc-14.1/bin/g++ t.C -std=gnu++2b -mavx2
t.C: In function ‘int main(int, char**)’:
t.C:105:29: warning: ignoring attributes on template argument ‘__m128’
[-Wignored-attributes]
  105 | std::array<__m128, 3> sse =
  | ^
In file included from
/spc/space/rguenther/install/gcc-14.1/lib64/gcc/x86_64-pc-linux-gnu/14.1.0/include/immintrin.h:39,
 from
/spc/space/rguenther/install/gcc-14.1/lib64/gcc/x86_64-pc-linux-gnu/14.1.0/include/x86intrin.h:32,
 from
/spc/space/rguenther/install/gcc-14.1/include/c++/14.1.0/experimental/bits/simd.h:45,
 from
/spc/space/rguenther/install/gcc-14.1/include/c++/14.1.0/experimental/simd:74,
 from t.C:4:
t.C: In static member function ‘static constexpr T math::vec::storage::dot_sse(FIRST, OTHER&& ...) [with FIRST = __vector(4) float; OTHER =
{__vector(4) float&, __vector(4) float&}; T = float; long unsigned int N = 3]’:
t.C:46:91: error: the last argument must be an 8-bit immediate
   46 | constexpr T dot_sse(FIRST first, OTHER&&... other) { return
_mm_dp_ps(first, (... * std::forward(other)), mask4dp(N))[0]; }
  | ^

[Bug lto/115359] ICE in warn_types_mismatch: lto1: internal compiler error: Segmentation fault

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115359

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
The issue is probably get_odr_name_for_type returning sth non-NULL for both.

But yeah, duping before copying looks wrong since we seem to expect
NULL eventually.

  if (name1 = cplus_demangle (odr1, opts))
{
  name1 = xstrdup (name1);
...

might be even better.

Honza?

[Bug c++/115358] [13/14/15 Regression] template argument deduction/substitution failed in generic lambda function use of static constexpr array type whos initializer defines the size

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115358

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug target/115355] [12/13/14/15 Regression] vectorization exposes wrong code on P9 LE starting from r12-4496

2024-06-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug target/115355] PPCLE: Auto-vectorization creates wrong code for Power9

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115355

Richard Biener  changed:

   What|Removed |Added

 Target||powerpc64le
   Keywords||wrong-code

--- Comment #2 from Richard Biener  ---
wild guess - store-with-len with bogus initial len/bias value?

[Bug tree-optimization/114932] IVopts inefficient handling of signed IV used for addressing.

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114932

--- Comment #10 from Richard Biener  ---
I think the question is why IVOPTs ends up using both the signed and unsigned
variant of the same IV instead of expressing all uses of both with one IV?

That's where I'd look into.

[Bug tree-optimization/115354] [14/15 Regression] Large -Os code size increase related to -ftree-sra

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115354

Richard Biener  changed:

   What|Removed |Added

Summary|Large -Os code size |[14/15 Regression] Large
   |increase related to |-Os code size increase
   |-ftree-sra  |related to -ftree-sra
   Target Milestone|--- |14.2
 CC||jamborm at gcc dot gnu.org
   Keywords||missed-optimization

--- Comment #1 from Richard Biener  ---
The optimization is performed optimistically anticipating followup
optimizations to make up for the immediate caused bloat (that's what I
understand).  I'm not
sure if we make any attempt of assessing the possibility of that to happen
but certainly this transform could be disabled when optimizing for size or
for cold calls?

[Bug rtl-optimization/115351] [14/15 regression] pointless movs when passing by value on x86-64

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115351

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*
Summary|[14 regression] pointless   |[14/15 regression]
   |movs when passing by value  |pointless movs when passing
   |on x86-64   |by value on x86-64
   Target Milestone|--- |14.2
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-05
  Component|c++ |rtl-optimization
   Keywords||missed-optimization,
   ||needs-bisection

--- Comment #1 from Richard Biener  ---
Confirmed.  The IL we expand from is the same.

[Bug tree-optimization/115347] [12/13/14/15 Regression] wrong code at -O3 on x86_64-linux-gnu

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115347

Richard Biener  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=112859
Version|unknown |14.1.1

--- Comment #2 from Richard Biener  ---
it's loop distribution doing

t2.c:7:12: optimized: Loop nest 1 distributed: split to 2 loops and 0 library
calls.

We get

  for (; f < 1; f++) {
for (h = 0; h < 2; h++) {
  d = e[f];
}
  }
  for (; f < 1; f++) {
for (h = 0; h < 2; h++) {
  g = e[1].c;
  e[f].c = 1;
}
  }

I think this is similar to the other still open issue where zero-distance
inner loop dependences ([f].c doesnt't vary in the inner loop) cause
issues with the interpretation of classical dependence analysis.

I'm somewhat lost there.  PR112859.

[Bug middle-end/115346] [15] Volatile load elimination with packed struct bitfields

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115346

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #4 from Richard Biener  ---
duplicate

*** This bug has been marked as a duplicate of bug 99258 ***

[Bug middle-end/99258] volatile struct access optimized away

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99258

Richard Biener  changed:

   What|Removed |Added

 CC||patrick at rivosinc dot com

--- Comment #4 from Richard Biener  ---
*** Bug 115346 has been marked as a duplicate of this bug. ***

[Bug middle-end/115345] [12/13/14/15 Regression] Different outputs compared to GCC 11- and MSVC/Clang

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115345

--- Comment #12 from Richard Biener  ---
(In reply to Djordje Baljozovic from comment #11)
> (In reply to Djordje Baljozovic from comment #9)
> > (In reply to Andrew Pinski from comment #7)
> > > A few questions, does `-fsanitize=undefined -fsanitize=address` report
> > > anything? Does it work at -O0 and not just -O3? Does adding
> > > -fno-strict-aliasing to the command line "fix" the crash? Are there any
> > > warnings with `-Wextra -Wall` that might be causing an issue?
> > 
> > Have not tested -O0 and -fno-strict-aliasing; will let you know if this
> > fixed the problem.
> > No warnings with -Wextra -Wall to my knowledge.
> > 
> > Sincerely,
> > George
> 
> Hi Andrew and Jakub,
> The results are more than interesting:
> 
> 1. -fno-strict-aliasing: none of the inputs processed (with O3)
> 2. O0: all but one input processed
> 3. O3: none of the inputs processed
> 4. O1 and O2: all inputs processed without any issues -- this did it.
> 
> Now the question is: how on Earth did O1/O2 do the trick, and not O0?!

Can you check whether -O0 works with the other compilers?  It feels like
you might be triggering some undefined behavior in your code.

If you have a short running example that breaks with -O0 it might be
also interesting to run it through valgrind to spot use-after-free
or uninitialized use issues.

> Once again, thanks a lot for your detailed and quick responses.
> George
> P.S. I will keep @Jakub's bisect idea in mind if something like this happens
> in the future.

[Bug tree-optimization/115344] Missing loop counter reversal

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115344

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-06-05

--- Comment #1 from Richard Biener  ---
IVOPTs can do this with and I also think without the help of IVCANON which
could add a decrementing IV (it only does that for constant number of
iterations
for some reason).

I'm not sure why, for this example, IVOPTs doesn't add a candidate IV
that decrements to zero.  I see

Predict doloop failure due to target specific checks.

so the doloop candidate isn't added?

[Bug target/115342] [14/15 Regression] AArch64: Function multiversioning initialization incorrect

2024-06-05 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115342

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.2

[Bug tree-optimization/113910] [12 Regression] Factor 15 slowdown compiling AMDGPUDisassembler.cpp on SPARC

2024-06-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113910

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
  Known to work||12.3.1
 Status|ASSIGNED|RESOLVED

--- Comment #20 from Richard Biener  ---
Fixed.

[Bug tree-optimization/110381] [11 Regression] double counting for sum of structs of floating point types

2024-06-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110381

Richard Biener  changed:

   What|Removed |Added

Summary|[11/12 Regression] double   |[11 Regression] double
   |counting for sum of structs |counting for sum of structs
   |of floating point types |of floating point types
   Priority|P3  |P2
  Known to fail||12.3.0
  Known to work||12.3.1

[Bug tree-optimization/115340] Loop/SLP vectorization possible inefficiency

2024-06-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115340

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2024-06-04
 Blocks||53947
 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
The issue is that the DRs for the loads tmp[0][i] and tmp[1][i] are not
related - they are off different base pointers.  At the moment we are
not merging unrelated "groups" (even though the loads are not marked
as grouped) into one SLP node.

The stores are not considered "grouped" because they have gaps.

With SLP-ification you'd get four instances and the same code-gen as now.

To do better we'd have to improve the store dataref analysis to see
that a vectorization factor of four would "close" the gaps, or more
generally support store groups with gaps.  Stores with gaps can be
handled by masking for example.

You get the store side handled when using -fno-tree-loop-vectorize to
get basic-block vectorization after unrolling the loop.  But you
still run into the issue that we do not combine from different load
groups during SLP discovery.  That's another angle you can attack;
during greedy discovery we also do not consider splitting the store
but instead build the loads from scalars which is of course less than
optimal.  Also since we do not re-process the built vector CTORs for
further basic-block vectorization opportunities.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

[Bug c++/115331] [13/14/15 Regression] ICE-on-invalid passing a typoed lambda to a list-initializer

2024-06-04 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115331

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P4

[Bug c/115326] __builtin_sub_overflow reports incorrect overflow value

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115326

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code
 CC||jakub at gcc dot gnu.org

--- Comment #1 from Richard Biener  ---
We lower it as

  int overflow1 = r->as_u64[0] = REALPART_EXPR <.SUB_OVERFLOW ((uint64_t)
a->as_u64[0], (uint64_t) b->as_u64[0])>, (int) (_Bool) IMAGPART_EXPR
<.SUB_OVERFLOW ((uint64_t) a->as_u64[0], (uint64_t) b->as_u64[0])>;

where the assignment to r->as_u64[0] is done before the re-evaluation
for the overflow bit.  A SAVE_EXPR is missing here?  Jakub?

[Bug lto/115327] [ld] [lto] using ld and lto, crash while dynamic compile executable

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115327

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID
 Target||arm

--- Comment #1 from Richard Biener  ---
This bugzilla is for GCC but you are using clang.  If you want to report a bug
in binutils BFD ld their bugzilla is sourceware.org/bugzilla

[Bug gcov-profile/114751] .gcda:stamp mismatch with notes file

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114751

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID
 CC||aoliva at gcc dot gnu.org

--- Comment #10 from Richard Biener  ---
GCC 11 indeed had a big revamp of how auxiliary files (like .gcno) are named.
In case of a single source file as in

  gcc   -c src-file.c   -o src-file.refo

the auxiliary files are now named after the output file name with
stripped extension.  So for the above it should be
src-file.gcno, the same as with -o src-file.o with GCC 10 or earlier
you'd get src-file.refo-src-file.gcno

The
https://gcc.gnu.org/onlinedocs/gcc-11.4.0/gcc/Overall-Options.html#index-dumpbase
documentation explains this in detail.  It was previously
inconsistent but notably it's now different that it was before.

Thanks for tracking the issue down, I consider this not a bug now but
CCed Alex who implemented this change in case he has anything to add
to the observed auxiliary file conflict of

 gcc -c src-file.c -o src-file.refo

and

 gcc -c src-file.c [-o src-file.o]

[Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

Richard Biener  changed:

   What|Removed |Added

 Target|sparc*-sun-solaris2.11 GCN  |GCN

--- Comment #8 from Richard Biener  ---
Should be fixed on sparc.

[Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

--- Comment #6 from Richard Biener  ---
For GCN the issue is that with vector(64) unsigned short we fail the permute
(but we have { target vect64 } for this reason), but we then re-try with
the same mode but with SLP disabled and that succeeds.

The best strathegy for GCN would be to gather V4QImode aka SImode into the
V64QImode (or V16SImode) vector.  For pix2 we have a gap of 28 elements,
doing consecutive loads isn't a good strategy here.

On x86 we can use a small vector and use half of it (gathers would be slow).

On sparc we start with V8QImode which is great but then sparc doesn't seem
able to build a V8QImode vector from two V4QImode vectors or have
V2SImode and build from two SImode values (and load SImode from pix1/pix2,
that possibly due to alignment).  I do see a vec_initv2sisi though.  Ah,
so we verify we can do the load using a permutation, permute two V8QImode
'a' and 'b' to get you a { a_low, b_low } V8QImode vector.  The other
part is eliding of the gap that will end up loading half of the vector
but then pad it out as { a_low, 0 } but then still invoke this unsupported
permutation to get { a_low, b_low }.  So in this case requiring vect_perm
would fix this though there is sparc_vectorize_vec_perm_const and vec_perm<>
guarded with VIS2, with -mvis2 we get past this failure point and run into

missed:   not vectorized: relevant stmt not supported: _35 = (unsigned short)
_34;

So there's no vec_upack_{hi,lo}_v4hi.  vect_unpack guards this.

Maybe I should move the test to be x86 specific.

I'll add the two dg-effective targets to fix the solaris fallout for now.

[Bug c++/95349] Using std::launder(p) produces unexpected behavior where (p) produces expected behavior

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95349

--- Comment #48 from Richard Biener  ---
(In reply to Christopher Nerz from comment #47)
> But shouldn't both give the same value?

I'm not sure what the standard says to this.  Does std::launder(...)
sanitize earlier "undefined behavior"?  For example failing to initialize
an object?

> The return of the new and the std::launder(...) point to the same object and
> are both equal read-operations! It is imho not predictable that they behave
> differently.

One load we can optimize to a constant, the other not (because of .LAUNDER).

[Bug c++/95349] Using std::launder(p) produces unexpected behavior where (p) produces expected behavior

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95349

Richard Biener  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=101641

--- Comment #46 from Richard Biener  ---
(In reply to Christopher Nerz from comment #45)
> This is a critical bug which renders gcc unusable for safety relevant
> systems using expected/variant or simple ipc.
> 
> You can get the same buggy behavior with far simpler code:
> https://godbolt.org/z/1WTnnYceM
> 
> 
> #include 
> #include 
> 
> bool check()
> {
> // Just to prove that it is not a problem with alignment etc.
> static_assert(alignof(double) == alignof(std::uint64_t));
> static_assert(sizeof(double) == sizeof(std::uint64_t));
> 
> alignas(8) std::byte buffer[8]; // some buffer
> new (buffer) double{1}; // some completely trivial data
> // reuse memory -> double ends lifetime, uint64 starts lifetime
> std::uint64_t * res = new (buffer) std::uint64_t;
> // *res is allowed to be used as it is the correct pointer returned by
> new
> // *res == 0x3ff0 // and gives correct value
> // The very definition of std::launder says that it is suppose to be
> used as:
> return (*res == *std::launder(reinterpret_cast(buffer)));
> }
> 
> int main(int argc, char **argv) {
> return check(); // gives false with activatred O2 (true with O0)
> }
> 
> 
> We get the same behavior when initialisating the memory at our version of
> "std::uint64_t * res = new (buffer) std::uint64_t;", but were unable to give
> a minimal example for that behavior.

For this case we end up with an indetermined value for 'buffer' read as
uint64_t but that indetermined value is different from the one read after
.LAUNDER.  A somewhat early IL is

  MEM[(double *)] = 1.0e+0;
  _1 = MEM[(uint64_t *)];
  _12 = .LAUNDER ();
  _3 = *_12;
  _13 = _1 == _3;

we then re-interpret 1.0e+0 as uint64_t and then remove the store as dead
because there's no valid use - the *_12 load is done as uint64_t.
The effect is that the later load reads from uninitialized stack.

Note that .LAUNDER only constitutes a data dependence between the 
and _12 pointer _values_ but there's no dependence of the memory contents
pointed to - .LAUNDER is ECF_NOVOPS.  That makes the compiler forget
what _12 points to but it doesn't make later uint64 loads valid from
*_12 from an earlier store to double.

[Bug pch/115312] [14/15 Regression] ICE when including a PCH via compiler option -include

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115312

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |14.2

[Bug c/115310] Option -Werror=return-type is too aggressive with -std=gnu89

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115310

--- Comment #6 from Richard Biener  ---
(In reply to Florian Weimer from comment #3)
> This is just following the previous GCC behavior. For example, with GCC 11:
> 
> $ gcc -S -Werror=return-type -std=gnu89 t.c
> t.c:1:1: error: return type defaults to ‘int’ [-Werror=return-type]
> 1 | main () { return 0; }
>   | ^~~~
> 
> I'm not sure how this is a problem in practice.
> 
> Using -Werror=return-type at the distribution level is … problematic. It's
> why we split -Werror=return-mismatch from it, and only enabled the latter by
> default in GCC 14.

But -Wreturn-mismatch doesn't diagnose the following, only -Wreturn-type does.
IIRC we made -Werror=return-type the default mainly because of this.

int foo()
{
}

I realize -std=gnu89 isn't perfect but if sources are happy with that
it's much better than -fpermissive - not only because -fpermissive
only works (is not diagnosed) with GCC14 for C.

I also realize -std=gnu89 is going to run into this very same issue with
older compilers.  Bah.

[Bug c/115311] -fno-builtin-xxx allowing anything for xxx

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115311

--- Comment #3 from Richard Biener  ---
Note we handle -Wno-xyz similarly, but of course a typo like -fno-builtin-sun
(s/sun/sin) isn't noticed this way which is the drawback.

[Bug target/115255] sibcall at -O0 causes ICE in df_refs_verify on arm

2024-06-03 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115255

Richard Biener  changed:

   What|Removed |Added

 CC|richard.guenther at gmail dot com  |rguenth at gcc dot 
gnu.org

--- Comment #8 from Richard Biener  ---
(In reply to Andrew Pinski from comment #5)
> The question comes is musttail going to always work at -O0 or should it just
> fail at -O0 with an error message. Or rather is musttail is just a hack in
> itself and should never be implemented.

I think it's going to be quite useless if it doesn't work at -O0.  I suppose
even demoting the error to must-tail to a warning when not optimizing
will be an improvement.  OTOH doing that generally (a warning, not error)
might be a possibility as well.  This isn't going to be a very portable
feature since the ability to tail-call depends on the ABI.

[Bug c/115310] Option -Werror=return-type is too aggressive with -std=gnu89

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115310

Richard Biener  changed:

   What|Removed |Added

 CC||fweimer at redhat dot com

--- Comment #1 from Richard Biener  ---
The logic that triggers is

  if (warn_about_return_type)
permerror_opt (loc, flag_isoc99 ? OPT_Wimplicit_int
   : (warn_return_type > 0 ? OPT_Wreturn_type
  : OPT_Wimplicit_int),
   "return type defaults to %");

and it's all documented this way.  We have -Werror=return-type to detect
the case "Also warn if execution may reach the end of the function
body, or if the function does not contain any return statement at all."

It would be nice if -std=gnu89 -Werror=return-type -Wno-implicit-int
would disable this particular instance about implicit int typed functions.

It's really ugly to force old code to use -fpermissive instead of the
much cleaner -std=gnu89 just because formerly, with the default of
newer -std, we only had a warning for the implicit int while with
-std=gnu89 we now get an error for it.  Did I say I dislike -fpermissive?
(which also gets you diagnostics for older compilers, so packages building
in multiple distributions get more difficult to maintain)

[Bug target/115307] [avr] Don't expand isinf() like a built-in

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115307

--- Comment #1 from Richard Biener  ---
The issue is that we probably fold isinff early.  On x86 I see already in
.original:

  return !(ABS_EXPR  u<= 3.4028234663852885981170418348451692544e+38);

I think your option is to provide optabs for isinf but make expansion
of them always FAIL; (which is of course a quite ugly way)

[Bug target/115282] [15 regression] gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c fails after r15-812-gc71886f2ca2e46

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115282

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Priority|P3  |P1
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Target|powerpc64-linux-gnu |powerpc64*-linux-gnu
 Status|NEW |ASSIGNED

--- Comment #3 from Richard Biener  ---
Ah, this is probably a case where we need to split because CSE causes us to
associate operations differently so SLP build for the whole thing fails.

The three-vector permute issue will go away when I manage to finish the load
part of the full SLP enablement.

It also fails on LE.  It's the

node 0x39913f0 (max_nunits=4, refcnt=2) vector(4) unsigned int
op template: _14 = in[_13];
stmt 0 _14 = in[_13];
load permutation { 6 }

note.  We split the 8-group into 6 and two times 1 element.  This needs
an intermediate (interleaving) permute and indeed the load part will fix it.

I suggest to leave this failing until then.  The loop is still vectorized
but using non-SLP full interleaving until then.

[Bug tree-optimization/115303] gcc.dg/vect/pr112325.c FAILs

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115303

--- Comment #2 from Richard Biener  ---
Yeah, if requiring vect_shift works for you that's pre-approved.

[Bug tree-optimization/115304] gcc.dg/vect/slp-gap-1.c FAILs

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115304

Richard Biener  changed:

   What|Removed |Added

   Keywords||testsuite-fail

--- Comment #2 from Richard Biener  ---
It should only need vect32 - basically I assumed the target can compose the
64bit vector from two 32bit elements.  But it might be that for this to work
the loads would need to be aligned.

What is needed is char-to-short unpacking and vector composition.  Either
composing V2SImode or V8QImode from two V4QImode vectors.

Does the following help?

diff --git a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
index 36463ca22c5..08942380caa 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-gap-1.c
@@ -4,6 +4,9 @@
 typedef unsigned char uint8_t;
 typedef short int16_t;
 void pixel_sub_wxh(int16_t * __restrict diff, uint8_t *pix1, uint8_t *pix2) {
+  diff = __builtin_assume_aligned (diff, __BIGGEST_ALIGNMENT__);
+  pix1 = __builtin_assume_aligned (pix1, 4);
+  pix2 = __builtin_assume_aligned (pix2, 4);
   for (int y = 0; y < 4; y++) {
 for (int x = 0; x < 4; x++)
   diff[x + y * 4] = pix1[x] - pix2[x];

[Bug ada/115305] [15 Regression] many (162) acats regressions on i686-darwin9

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115305

Richard Biener  changed:

   What|Removed |Added

 Target||i686-darwin9
   Target Milestone|--- |15.0

[Bug tree-optimization/115278] [13/14 Regression] -ftree-vectorize optimizes away volatile write on x86_64 since r13-3219

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115278

Richard Biener  changed:

   What|Removed |Added

Summary|[13/14/15 Regression]   |[13/14 Regression]
   |-ftree-vectorize optimizes  |-ftree-vectorize optimizes
   |away volatile write on  |away volatile write on
   |x86_64 since r13-3219   |x86_64 since r13-3219
  Known to work||15.0

--- Comment #10 from Richard Biener  ---
Fixed on trunk sofar.

[Bug tree-optimization/115278] [13/14/15 Regression] -ftree-vectorize optimizes away volatile write on x86_64 since r13-3219

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115278

--- Comment #6 from Richard Biener  ---
(In reply to avieira from comment #5)
> > I think we fixed similar bug on the read side.
> 
> I don't have the best memory, but the one I can remember is PR 111882, where
> we had the SAVE_EXPR. And the the fix was to not lower bitfields with
> non-constant offsets.
> 
> Should dse_classify_store not return *_DEAD for volatiles?

It's a low-level worker, it relies on the caller to have performed sanity
checking on the stmt itself.  I'm testing a patch doing that.

[Bug lto/115300] gcc 14 cannot compile itself on Windows when bootstrap-lto is specified

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115300

--- Comment #3 from Richard Biener  ---
Can you try --disable-plugin?  It might be the mingw equivalent of exporting
all dynamic symbols from the cc1 binary runs into target limitations?  It looks
like the default on *-*-mingw* is disabled though ...

[Bug tree-optimization/115278] [13/14/15 Regression] -ftree-vectorize optimizes away volatile write on x86_64 since r13-3219

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115278

Richard Biener  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org

--- Comment #4 from Richard Biener  ---
It's actually a latent issue, unrelated to bitfields?  We elide the store via

  tree lhs = gimple_get_lhs (stmt);
  ao_ref write;
  ao_ref_init (, lhs);

  if (dse_classify_store (, stmt, false, NULL, NULL, latch_vdef)
  == DSE_STORE_DEAD)
delete_dead_or_redundant_assignment (, "dead");

but that fails to guard against volatiles.

[Bug rtl-optimization/115297] [14/15 regression] alpha: ICE in simplify_subreg, at simplify-rtx.cc:7554 with -O1

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115297

Richard Biener  changed:

   What|Removed |Added

Summary|[14 regression] alpha: ICE  |[14/15 regression] alpha:
   |in simplify_subreg, at  |ICE in simplify_subreg, at
   |simplify-rtx.cc:7554 with   |simplify-rtx.cc:7554 with
   |-O1 |-O1
   Target Milestone|--- |14.2

[Bug testsuite/115294] [15 regression] dg-additional-files-options change broke several testsuites

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115294

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug ada/115292] [15 Regression] i686-darwin17 bootstrap fails for Ada (between r15-856 and r15-889)

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115292

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
Version|9.0 |15.0

[Bug c/115290] [12/13/14/15 Regression] tree check fail in c_tree_printer, at c/c-objc-common.cc:330

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115290

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug tree-optimization/115278] [13/14/15 Regression] -ftree-vectorize optimizes away volatile write on x86_64 since r13-3219

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115278

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #3 from Richard Biener  ---
I think we fixed similar bug on the read side.

[Bug middle-end/115277] [13/14/15 regression] ICF needs to match loop bound estimates

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115277

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |13.4

[Bug tree-optimization/115298] [15 Regression] Various targets failing DSE tests after recent changes

2024-05-31 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115298

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-05-31
   Keywords||testsuite-fail
 Ever confirmed|0   |1
   Target Milestone|--- |15.0

--- Comment #1 from Richard Biener  ---
Huh, I honestly have no idea how those targets would differ here ...

I do see

void h (char * s)
{
  # PT = anything
  char * s_3(D) = s;
  char a[8];

   :
  __builtin_memset (, 0, 8);
  __builtin_strncpy (, s_3(D), 8);
  # USE = anything
  # CLB = anything
  frob ();
  a ={v} {CLOBBER(eos)};
  return;

for nds32-sim but

  Deleted dead call: __builtin_memset (, 0, 8);

void h (char * s)
{
  # PT = nonlocal null
  char * s_3(D) = s;
  char a[8];

   :
  __builtin_strncpy (, s_3(D), 8);
  # USE = nonlocal escaped null { D.2716 } (escaped)
  # CLB = nonlocal escaped null { D.2716 } (escaped)
  frob ();
  a ={v} {CLOBBER(eos)};
  return;

for x86-64.  But then the points-to solutions should not make any difference
for DSE in this case ... (the points-to difference is odd in the first place
of course).

So for the points-to difference this is caused by

-a = 
+a = INTEGER

which likely means a different default of -fno-delete-null-pointer-checks
or ADDR_SPACE_ADDRESS_ZERO_VALID.  That causes us to bring in what the
object at (void *)0 points to, and that's ANYTHING since we do not track
objects at constant addresses in any way, and those might alias all other
objects.  The question is more why we generate a =  at all, but that's
a pre-existing issue.  We now simply handle all this correctly (we didn't
before, with latent wrong-code).

Ah, and the DSE effect then is obviously that now 'strncpy (, s_3(D),..)'
reads from a since s_3(D) points to anything now (which includes 'a'), so
we can no longer remove/trim an earlier store to 'a'.

Ah, and the a =  constraint is from the memset.

Since we pass a to frob it escapes and everything escaped memory points
to also escapes so anything escapes.

So I'd say it works correctly now.

There might be a missing indirection between NONLOCAL and ESCAPED.  Since
s =  even when anything is in ESCAPED anything isn't NONLOCAL
itself (well, but of course technically s can point to NULL as well -
another latent incorrectness in PTA, we do not track NULL conservatively,
a correctness mistake with ADDR_SPACE_ADDRESS_ZERO_VALID).

Btw, changing the testcases to

extern void frob (char *);

void h (char *s)
{
  char a[8];
  __builtin_memset (a, 1, sizeof a);
  __builtin_strncpy (a, s, sizeof a);
  frob (a);
}

shows the same effect on x86_64 - suddenly 'a' points to ANYTHING
(0x010101010101...), which makes 's' point to ANYTHING and DSE is gone.

Confirmed for the testsuite regression.  I don't see how this is a bug
though.  Maybe the stack object 'a' can never be at address zero?  Or
any "fixed" address?  I'm not sure that such constraint can be modeled in PTA
("split" ANYTHING somehow).

Adding -fdelete-null-pointer-checks to the test makes it succeed also on
nds32le-elf.

[Bug target/115282] [15 regression] gcc.dg/vect/costmodel/ppc/costmodel-slp-12.c fails after r15-812-gc71886f2ca2e46

2024-05-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115282

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
   Keywords||testsuite-fail
  Component|other   |target
Summary|15 regression]  |[15 regression]
   |gcc.dg/vect/costmodel/ppc/c |gcc.dg/vect/costmodel/ppc/c
   |ostmodel-slp-12.c fails |ostmodel-slp-12.c fails
   |after   |after
   |r15-812-gc71886f2ca2e46 |r15-812-gc71886f2ca2e46

--- Comment #1 from Richard Biener  ---
I don't see a good reason why, but I don't have a BE cross around to check
myself.  Does BE vect maybe not have unsigned integer vector multiplication
support?

[Bug tree-optimization/115275] [14/15 Regression] Missed optimization for Dead Code Elimination

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115275

Richard Biener  changed:

   What|Removed |Added

  Known to work||13.3.0
   Keywords||missed-optimization,
   ||needs-bisection
   Priority|P3  |P2
  Known to fail||14.1.0, 15.0
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Target Milestone|--- |14.2
   Last reconfirmed||2024-05-29

--- Comment #1 from Richard Biener  ---
Confirmed.

[Bug sanitizer/115273] [12 Regression] passing zero to ctz() check missing

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115273

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |12.4

[Bug debug/115272] [debug] complex type is hard to related back to base type

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115272

--- Comment #2 from Richard Biener  ---
(In reply to Richard Biener from comment #1)
> How does it work for 'double' vs. 'long double' themselves?
> 
>  <1><32>: Abbrev Number: 3 (DW_TAG_base_type)
> <33>   DW_AT_byte_size   : 16
> <34>   DW_AT_encoding: 4(float)
> <35>   DW_AT_name: (indirect string, offset: 0x60): long double
> 
> so if it's not distinguishable via DW_AT_byte_size you look into
> DW_AT_name as well?  So it looks like doing the same for _Complex long double
> is perfectly in line?

Take for example powerpc with it's dual IEEE and IBM long double 128 format.

[Bug debug/115272] [debug] complex type is hard to related back to base type

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115272

--- Comment #1 from Richard Biener  ---
How does it work for 'double' vs. 'long double' themselves?

 <1><32>: Abbrev Number: 3 (DW_TAG_base_type)
<33>   DW_AT_byte_size   : 16
<34>   DW_AT_encoding: 4(float)
<35>   DW_AT_name: (indirect string, offset: 0x60): long double

so if it's not distinguishable via DW_AT_byte_size you look into
DW_AT_name as well?  So it looks like doing the same for _Complex long double
is perfectly in line?

[Bug tree-optimization/115252] The SLP vectorizer failed to perform automatic vectorization on pixel_sub_wxh of x264

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115252

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
 Target||x86_64-*-*

--- Comment #3 from Richard Biener  ---
This testcase should be fixed now.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 115252, which changed state.

Bug 115252 Summary: The SLP vectorizer failed to perform automatic 
vectorization on pixel_sub_wxh of x264
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115252

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/114435] PCOM messes up vectorization some times

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435

--- Comment #10 from Richard Biener  ---
(In reply to Richard Biener from comment #9)
> So the "pcom messes up SLP" part should be fixed now.  The pass dependence
> of invariant/store motion and unswitching (and likely also loop splitting) is
> something different.  We may want to track this in a seprate bug.

Note there's a conditional (on graphite) LIM pass after high-level loop opts,
it might be an option to turn it into an unconditional instance.

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 114435, which changed state.

Bug 114435 Summary: PCOM messes up vectorization some times
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 114435, which changed state.

Bug 114435 Summary: PCOM messes up vectorization some times
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug tree-optimization/114435] PCOM messes up vectorization some times

2024-05-29 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114435

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Richard Biener  ---
So the "pcom messes up SLP" part should be fixed now.  The pass dependence of
invariant/store motion and unswitching (and likely also loop splitting) is
something different.  We may want to track this in a seprate bug.

  1   2   3   4   5   6   7   8   9   10   >