[Bug tree-optimization/107838] spurious "may be used uninitialized" warning on variable initialized at the first iteration of a loop

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107838

--- Comment #4 from Richard Biener  ---
comment#3 is a slightly different issue in that we do not have the guarding
condition obviously true here, instead what we'd need to prove is that
r_8 is always initialized because the first loop iteration initializes it.

Maybe it's somehow possible to code that into the uninit analysis machinery,
I'd have to think about this.

[Bug analyzer/108252] false positive: leak detection

2023-01-11 Thread chipitsine at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108252

--- Comment #5 from Илья Шипицин  ---
thank you, David!

I'll rerun haproxy check soon

[Bug libstdc++/108221] Building cross compiler for H8 family fails at libstdc++-v3/src/c++20/tzdb.cc

2023-01-11 Thread jdx at o2 dot pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108221

--- Comment #18 from Jan Dubiec  ---
Thanks Jonathan. I confirm – I have successfully built master (554bb9b6) for
both targets. Although I'm still using binutils 2.39, I haven't tried its
current master yet.

[Bug preprocessor/108244] [13 Regression] `pragma GCC diagnostic` and -E -fdirectives-only causes the preprocessor to become confused since r13-1544-ge46f4d7430c52104

2023-01-11 Thread lhyatt at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108244

Lewis Hyatt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #12 from Lewis Hyatt  ---
Fixed by r13-5114.

[Bug preprocessor/108244] [13 Regression] `pragma GCC diagnostic` and -E -fdirectives-only causes the preprocessor to become confused since r13-1544-ge46f4d7430c52104

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108244

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Lewis Hyatt :

https://gcc.gnu.org/g:9ca4899144de6db61a782b03a1257489bd26f750

commit r13-5114-g9ca4899144de6db61a782b03a1257489bd26f750
Author: Lewis Hyatt 
Date:   Thu Dec 29 16:55:21 2022 -0500

preprocessor: Don't register pragmas in directives-only mode [PR108244]

libcpp's directives-only mode does not expect deferred pragmas to be
registered, but to date the c-family registration process has not checked
for
this case. That issue became more visible since r13-1544, which added the
commonly used GCC diagnostic pragmas to the set of those registered in
preprocessing modes. Fix it by checking for directives-only mode in
c-family/c-pragma.cc.

gcc/c-family/ChangeLog:

PR preprocessor/108244
* c-pragma.cc (c_register_pragma_1): Don't attempt to register any
deferred pragmas if -fdirectives-only.
(init_pragma): Likewise.

gcc/testsuite/ChangeLog:

* c-c++-common/cpp/pr108244-1.c: New test.
* c-c++-common/cpp/pr108244-2.c: New test.
* c-c++-common/gomp/pr108244-3.c: New test.

[Bug tree-optimization/108379] -Wmaybe-uninitialized false positive on conditional use

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108379

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||diagnostic

--- Comment #3 from Andrew Pinski  ---
I suspect since pipe_check is global is causing the unintialized warning
conditional checks to be thrown off because I suspect GCC thinks it can be
modified between the 3 checks of pipe_check when out_pollable is used.

So if you did a local copy of pipe_check to cache it inside tee_files and use
that, the warning will go away.

[Bug target/96373] SVE miscompilation on vectorized division loop, leading to FP exception

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96373

--- Comment #13 from Andrew Pinski  ---

  vect__1.9_40 = .MASK_LOAD (_13, 64B, loop_mask_39);
  _15 =   [(double *)s_9(D) + ivtmp_48 * 8];
  vect__2.12_43 = .MASK_LOAD (_15, 64B, loop_mask_39);
  vect__3.13_44 = vect__1.9_40 / vect__2.12_43;
  .MASK_STORE (_13, 64B, loop_mask_39, vect__3.13_44);

The divide should have been masked using the loop_mask_39 too.

[Bug target/96373] SVE miscompilation on vectorized division loop, leading to FP exception

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96373

Andrew Pinski  changed:

   What|Removed |Added

 CC||evatux at gmail dot com

--- Comment #12 from Andrew Pinski  ---
*** Bug 108378 has been marked as a duplicate of this bug. ***

[Bug target/108378] gcc generates fpu traps unsafe code for armv8-a+sve

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108378

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup of bug 96373.
Basically fdivr should be using the predicate of p0 from the load rather than
the p1 predicate.

*** This bug has been marked as a duplicate of bug 96373 ***

[Bug tree-optimization/108379] -Wmaybe-uninitialized false positive on conditional use

2023-01-11 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108379

--- Comment #2 from Sam James  ---
Minimum command needed to reproduce:
```
$ gcc-13 -Werror -Wuninitialized -O2 -c tee.i
src/tee.c: In function 'tee_files':
src/tee.c:272:25: error: 'out_pollable' may be used uninitialized
[-Werror=maybe-uninitialized]
src/tee.c:238:8: note: 'out_pollable' was declared here
cc1: all warnings being treated as errors
```

[Bug tree-optimization/108379] -Wmaybe-uninitialized false positive on conditional use

2023-01-11 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108379

--- Comment #1 from Sam James  ---
Created attachment 54252
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54252=edit
coreutils-tee.patch

[Bug tree-optimization/108379] New: -Wmaybe-uninitialized false positive on conditional use

2023-01-11 Thread sam at gentoo dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108379

Bug ID: 108379
   Summary: -Wmaybe-uninitialized false positive on conditional
use
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: sam at gentoo dot org
  Target Milestone: ---

Created attachment 54251
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54251=edit
tee.i

Occurs when applying a patch to coreutils on top of its git repo at
7fc84d1c0f6b35231b0b4577b70aaa26bf548a7c (attached for completeness):
```
gcc-13  -I. -I./lib  -Ilib -I./lib -Isrc -I./src  -Werror -fno-common -Wall
-Warith-conversion -Wbad-function-cast -Wcast-align=strict -Wdate-time
-Wdisabled-optimization -Wdouble-promotion -Wduplicated-branches
-Wduplicated-cond -Wextra -Wformat-signedness -Winit-self -Winvalid-pch
-Wlogical-op -Wmissing-declarations -Wmissing-include-dirs -Wmissing-prototypes
-Wnull-dereference -Wold-style-definition -Wopenmp-simd -Woverlength-strings
-Wpacked -Wpointer-arith -Wshadow -Wstrict-overflow -Wstrict-prototypes
-Wsuggest-attribute=cold -Wsuggest-attribute=const -Wsuggest-attribute=format
-Wsuggest-attribute=malloc -Wsuggest-attribute=noreturn
-Wsuggest-attribute=pure -Wsuggest-final-methods -Wsuggest-final-types
-Wsync-nand -Wtrampolines -Wuninitialized -Wunknown-pragmas -Wunused-macros
-Wvariadic-macros -Wvla -Wwrite-strings -Warray-bounds=2 -Wattribute-alias=2
-Wbidi-chars=any,ucn -Wformat=2 -Wimplicit-fallthrough=5 -Wshift-overflow=2
-Wuse-after-free=3 -Wunused-const-variable=2 -Wvla-larger-than=4031
-Wno-sign-compare -Wno-unused-parameter -Wno-format-nonliteral
-fdiagnostics-show-option -funit-at-a-time -Wno-return-local-addr -g -O2 -MT
src/tee.o -MD -MP -MF $depbase.Tpo -c -o src/tee.o src/tee.c &&\
mv -f $depbase.Tpo $depbase.Po
src/tee.c: In function 'tee_files':
src/tee.c:272:25: error: 'out_pollable' may be used uninitialized
[-Werror=maybe-uninitialized]
  272 | out_pollable[i] = iopoll_output_ok (fileno (descriptors[i]));
  | ^~~~
src/tee.c:238:9: note: 'out_pollable' was declared here
  238 |   bool *out_pollable;
  | ^~~~
cc1: all warnings being treated as errors
```

Reproduced with both GCC 12.2.1 20230107 and GCC 13.0.0 20230108.

Attached preprocessed source.

[Bug target/108378] New: gcc generates fpu traps unsafe code for armv8-a+sve

2023-01-11 Thread evatux at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108378

Bug ID: 108378
   Summary: gcc generates fpu traps unsafe code for armv8-a+sve
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: evatux at gmail dot com
  Target Milestone: ---

Created attachment 54250
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54250=edit
original reproducer

Consider the following simple function that does fp32 inverse:
```
void foo(float *x, int len) {
for (int i = 0; i < len; ++i) x[i] = 1.f / x[i];
}
```

GCC with “-O3 -march=armv8-a+sve” produces the code that raises FE_DIVBYZERO
fpu exception. The assembler for the function is the following: 

```
whilelo p0.s, wzr, w0
fmovz1.s, #1.0e+0
ptrue   p1.b, all
.L3:
ld1wz0.s, p0/z, [x1, x2, lsl 2]
fdivr   z0.s, p1/m, z0.s, z1.s
st1wz0.s, p0, [x1, x2, lsl 2]
add x2, x2, x3
whilelo p0.s, w2, w0
b.any   .L3
```

Note, that p0 predicate register is used for loading. However, division uses
p1, which is all true. This leads to div by zero FPE when executing fdivr
instruction, if len is not a multiple of SVE width.

The issue is reproduced with gcc-10, gcc-11, and gcc-12.
Also checked gcc-trunk, by inspecting assembler output in godbolt.
Link: https://gcc.godbolt.org/z/Yz7chEcfT.

The reproducer is attached:
```
$ gcc-12 repro.c -O3 -march=armv8-a+sve -lm && ./a.out
fegetexceptflag(FE_DIVBYZERO) was:0 now:2
Test FAILED
```

Build w/o SVE or pass -DARRAY_LEN=64, and the test passes.
Adding options like `-fno-unsafe-math-optimizations` and `-ftrapping-math`
don't help.

The issue is especially noticiable when mixing Fortran and C code, as the
former has FPU exception checks enabled by default.

[Bug libgcc/108279] Improved speed for float128 routines

2023-01-11 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108279

--- Comment #5 from Michael_S  ---
Hi Thomas
Are you in or out?

If you are still in, I can use your help on several issues.

1. Torture. 
See if Invalid Operand exception raised properly now. Also if there are still
remaining problems with NaN.

2. Run my correction tests on as many non-AMD64 targets as you can. Preferably,
with 100,000,000 iterations, but on weaker HW 10,000,000 will do.

3. Run my speed tests (tests/matmulq/mm_speed_ma) on more diverse set of AMD64
computers than I did.
Of special interest are
- AMD Zen3 on Linux running on bare metal
- Intel Skylake, SkylakeX, Tiger/Rocket Lake and Alder Lake on Linux running on
bare metal
I realize that doing speed tests is not nearly as simple as correctness tests.
We need non-busy (preferably almost idle) machines that have stable CPU clock
rate. It's not easy to find machines like that nowadays. But, may be, you can
find at least some from the list.

4. Run my speed tests on as many non-obsolete ARM64 computers as you can find.
Well, probably a wishful thinking on my part.


Also off topic but of interest: postprocessed source of matmul_r16.c

[Bug tree-optimization/99411] s311, s312, s31111, s31111, s3110, vsumr benchmark of TSVC is vectorized by clang better than by gcc

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99411

--- Comment #8 from Jan Hubicka  ---
Compared to aocc we also do worse on zen4:
jh@alberti:~/tsvc/bin> ~/trunk-install/bin/gcc -Ofast -march=native s311.c  
jh@alberti:~/tsvc/bin> time ./a.out

real0m3.207s
user0m3.206s
sys 0m0.000s
jh@alberti:~/tsvc/bin> ~/aocc-compiler-4.0.0/bin/clang -Ofast -march=native
s311.c 
jh@alberti:~/tsvc/bin> time ./a.out

real0m1.221s
user0m1.221s
sys 0m0.000s

aocc code seems similar to clangs from two years ago except for additional use
of avx512.

main:   # @main
.cfi_startproc
# %bb.0:# %entry
xorl%eax, %eax
.p2align4, 0x90
.LBB0_1:# %vector.ph
# =>This Loop Header: Depth=1
# Child Loop BB0_2 Depth 2
vxorps  %xmm0, %xmm0, %xmm0
movq$-128000, %rcx  # imm = 0xFFFE0C00
vxorps  %xmm1, %xmm1, %xmm1
vxorps  %xmm2, %xmm2, %xmm2
vxorps  %xmm3, %xmm3, %xmm3
.p2align4, 0x90
.LBB0_2:# %vector.body
#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vaddps  a+128000(%rcx), %zmm0, %zmm0
vaddps  a+128064(%rcx), %zmm1, %zmm1
vaddps  a+128128(%rcx), %zmm2, %zmm2
vaddps  a+128192(%rcx), %zmm3, %zmm3
addq$256, %rcx  # imm = 0x100
jne .LBB0_2
# %bb.3:# %middle.block
#   in Loop: Header=BB0_1 Depth=1
incl%eax
cmpl$100, %eax  # imm = 0xF4240
jne .LBB0_1
# %bb.4:# %for.cond.cleanup
vaddps  %zmm0, %zmm1, %zmm0
xorl%eax, %eax
vaddps  %zmm0, %zmm2, %zmm0
vaddps  %zmm0, %zmm3, %zmm0
vextractf64x4   $1, %zmm0, %ymm1
vaddps  %zmm1, %zmm0, %zmm0
vextractf128$1, %ymm0, %xmm1
vaddps  %xmm1, %xmm0, %xmm0
vpermilpd   $1, %xmm0, %xmm1# xmm1 = xmm0[1,0]
vaddps  %xmm1, %xmm0, %xmm0
vmovshdup   %xmm0, %xmm1# xmm1 = xmm0[1,1,3,3]
vaddss  %xmm1, %xmm0, %xmm0
vucomiss.LCPI0_0(%rip), %xmm0
seta%al
vzeroupper
retq

[Bug middle-end/99634] s2102 benchmarks of TSVC is vectorized better by icc than gcc, interchange is missing

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99634

--- Comment #2 from Jan Hubicka  ---
AOCC produced code is:

.LBB0_2:# %vector.body
#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vpbroadcastq%rdx, %zmm4
kxnorw  %k0, %k0, %k1
incq%rdx
vpsllq  $2, %zmm4, %zmm4
vpaddq  %zmm4, %zmm0, %zmm4
vpaddq  %zmm7, %zmm4, %zmm5
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  .LCPI0_0(%rip), %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  .LCPI0_3(%rip), %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  .LCPI0_2(%rip), %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm11, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  .LCPI0_4(%rip), %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm13, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm12, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm15, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm14, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm17, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm16, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm19, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm18, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm21, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm20, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm23, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm22, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
   vpaddq  %zmm25, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm24, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm27, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm26, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm29, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm28, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm31, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm30, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm2, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm1, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm8, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm6, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm5) {%k1}
vpaddq  %zmm10, %zmm4, %zmm5
kxnorw  %k0, %k0, %k1
vpaddq  %zmm9, %zmm4, %zmm4
vscatterqps %ymm3, (,%zmm5) {%k1}
kxnorw  %k0, %k0, %k1
vscatterqps %ymm3, (,%zmm4) {%k1}
movl$1065353216, (%rcx) # imm = 0x3F80
addq$1028, %rcx # imm = 0x404
cmpq$256, %rdx  # imm = 0x100
jne .LBB0_2
# %bb.3:# %for.cond.cleanup3

[Bug fortran/108369] FM509 Fails to compile with error

2023-01-11 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108369

--- Comment #3 from kargl at gcc dot gnu.org ---
(In reply to anlauf from comment #1)
> (In reply to Ben Brewer from comment #0)
> Workaround: either use -std=legacy or fix the above argument declaration to:
> 
>   CHARACTER C1D001(*)*8,CVD001*8

While the workaround will work, it does so because it disables
-fallow-argument-mismatch.  But, that feature emitting a bogus
error/warning.

Note the following all compile and execute.  TKR is satisfied
as I discussion in comment #2.

! Compiles and executes
program foo1
   character(len=10) a
   a = '1234567890'
   call sub1(a)
end
subroutine sub1(s)
   character(len=8) s
   print *, '>' // s // '<'
end

! Compiles and executes
program foo2
   character(len=10) a(5)
   a = '1234567890'
   call sub2(a)
end
subroutine sub2(s)
character(len=8) s(2)
   print *, '>' // s(1) // '<'
end

! Compiles and executes
program foo3
   character(len=10) a(5)
   a = '1234567890'
   call sub3(a(2)(2:4))
end
subroutine sub3(s)
character(len=8) s(2)
   print *, '>' // s(1) // '<'
end

But,

% gfcx -o z a.f90 && ./z
a.f90:40:13:

   40 |call sub4(a(2)(2:4))
  | 1
Error: Actual argument contains too few elements for dummy
argument 's' (39/80) at (1)

! Whoops
program foo4
   character(len=10) a(5)
   a = '1234567890'
   call sub4(a(2)(2:4))
end
subroutine sub4(s)
character(len=8) s(10)! <-- only difference from foo3
   print *, '>' // s(1) // '<'
end

The give away that something is amiss is the (39/80) part of
the error message.  80 = 8*10, i.e., total number of characters.
I cannot quite get 39.  39 = 50 - 11, but 11 does not match up
with the substring length of a(2)(2:4).

Now, looking at interface.cc starting at line 3354, we have


  if (a->expr->ts.type == BT_CHARACTER && !f->sym->as && where)
{
  gfc_warning (0, "Character length of actual argument shorter "
   "than of dummy argument %qs (%lu/%lu) at %L",
   f->sym->name, actual_size, formal_size,
   >expr->where);
  goto skip_size_check;
}
  else if (where)
{
  /* Emit a warning for -std=legacy and an error otherwise. */
  if (gfc_option.warn_std == 0)
gfc_warning (0, "Actual argument contains too few "
 "elements for dummy argument %qs (%lu/%lu) "
 "at %L", f->sym->name, actual_size,
 formal_size, >expr->where);
  else
gfc_error_now ("Actual argument contains too few "
   "elements for dummy argument %qs (%lu/%lu) "
   "at %L", f->sym->name, actual_size,
   formal_size, >expr->where);
}

clearly we want the first branch about character length, so
`where` == NULL

[Bug analyzer/108252] false positive: leak detection

2023-01-11 Thread dmalcolm at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108252

--- Comment #4 from David Malcolm  ---
Should be fixed on trunk for gcc 13 by the above commit.

I *think* the store::set_value change can be readily backported to GCC 12, so
keeping this bug open to track that backport (perhaps even earlier???)

[Bug fortran/108369] FM509 Fails to compile with error

2023-01-11 Thread kargl at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108369

kargl at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |NEW
 CC||kargl at gcc dot gnu.org

--- Comment #2 from kargl at gcc dot gnu.org ---
Reduced testcase

  PROGRAM FM509

  IMPLICIT CHARACTER*27 (C)

  CHARACTER C1N001(6)*10
  DATA C1N001 /'FIRST-AID:','SECONDRATE','THIRD-TERM',
 1 'FOURTH-DAY','FIFTHROUND','SIXTHMONTH'/

  CVCOMP = ' '
  CALL SN512(C1N001(5)(2:9),CVCOMP)

  END

  SUBROUTINE SN512(C1D001,CVD001)
  CHARACTER C1D001(6)*8,CVD001*8
  CVD001 = C1D001(1)
  END

gfcx -w -o z y.f 
y.f:10:17:

   10 |   CALL SN512(C1N001(5)(2:9),CVCOMP)
  | 1
Error: Actual argument contains too few elements for dummy argument 'c1d001'
(19/48) at (1)

Normally, this comes down to type, kind type parameter, and rank (TKR)
matching of the actual and dummy arguments.

In the call to sn512, c1n001(5)(2:9) is a substring of length
8 of the fifth element of the array c1n001.  The subroutine is
expecting to receive a 6-element array with each element having
a length of 8.

   15.5.2.4 Ordinary dummy variables
   ...
   The dummy argument shall be type compatible with the actual
   argument.
   ...
   The kind type parameter values of the actual argument shall
   agree with the corresponding ones of the dummy argument. The
   length type parameter values of a present actual argument shall
   agree with the corresponding ones of the dummy argument that
   are not assumed, except for the case of the character length
   parameter of an actual argument of type character with default
   kind or C character kind (18.2.2) associated with a dummy
   argument that is not assumed-shape or assumed-rank.

   3.147.11
   type compatible
   compatibility of the type of one entity with respect to another for
   purposes such as argument association, pointer association, and
   allocation (7.3.2)

Hmmm, it appears the argument mismatch feature added under the
-fallow-argmument-mismatch option might be running afoul of the 
standard.

[Bug analyzer/108252] false positive: leak detection

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108252

--- Comment #3 from CVS Commits  ---
The master branch has been updated by David Malcolm :

https://gcc.gnu.org/g:688fc162b76dc6747a30fcfd470f4770da0f4924

commit r13-5113-g688fc162b76dc6747a30fcfd470f4770da0f4924
Author: David Malcolm 
Date:   Wed Jan 11 16:27:06 2023 -0500

analyzer: fix leak false positives on "*UNKNOWN = PTR;" [PR108252]

PR analyzer/108252 reports a false positive from -Wanalyzer-malloc-leak on
code like this:

  *ptr_ptr = strdup(EXPR);

where ptr_ptr is an UNKNOWN_VALUE.

When we handle:
  *UNKNOWN = PTR;
store::set_value normally marks *PTR as having escaped, and this means
we don't report PTR as leaking when the last usage of PTR is lost.

However this only works for cases where PTR is a region_svalue.
In the example in the bug, it's a conjured_svalue, rather than a
region_svalue.  A similar problem can arise for FDs, which aren't
pointers.

This patch fixes the bug by updating store::set_value to mark any
values stored via *UNKNOWN = VAL as not leaking.

Additionally, sm-malloc.cc's known_allocator_p hardcodes strdup and
strndup as allocators (and thus transitioning their result to
"unchecked"), but we don't implement known_functions for these, leading
to the LHS being a CONJURED_SVALUE, rather than a region_svalue to a
heap-allocated region.  A similar issue happens with functions marked
with __attribute__((malloc)).  As part of a "belt and braces" fix, the
patch also updates the handling of these functions, so that they use
heap-allocated regions.

gcc/analyzer/ChangeLog:
PR analyzer/108252
* kf.cc (class kf_strdup): New.
(class kf_strndup): New.
(register_known_functions): Register them.
* region-model.cc (region_model::on_call_pre): Use
_ALLOCATED_REGION for the default result of an external
function with the "malloc" attribute, rather than CONJURED_SVALUE.
(region_model::get_or_create_region_for_heap_alloc): Allow
"size_in_bytes" to be NULL.
* store.cc (store::set_value): When handling *UNKNOWN = VAL,
mark VAL as "maybe bound".

gcc/testsuite/ChangeLog:
PR analyzer/108252
* gcc.dg/analyzer/attr-malloc-pr108252.c: New test.
* gcc.dg/analyzer/fd-leak-pr108252.c: New test.
* gcc.dg/analyzer/flex-with-call-summaries.c: Remove xfail from
warning false +ve directives.
* gcc.dg/analyzer/pr103217-2.c: Add -Wno-analyzer-too-complex.
* gcc.dg/analyzer/pr103217-3.c: Likewise.
* gcc.dg/analyzer/strdup-pr108252.c: New test.
* gcc.dg/analyzer/strndup-pr108252.c: New test.

Signed-off-by: David Malcolm 

[Bug tree-optimization/108377] Unexpected 'exceeds maximum object size' diagnostic, wrong-code?

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108377

--- Comment #3 from Andrew Pinski  ---
Just adding:

  if (n+1 == 0)  __builtin_unreachable();
Right before the first malloc removes the warning as expected.

[Bug tree-optimization/108377] Unexpected 'exceeds maximum object size' diagnostic, wrong-code?

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108377

--- Comment #2 from Andrew Pinski  ---
So we have:
  const __SIZE_TYPE__ n = calc_n(259);
#if 1
  haystack = __builtin_malloc(n + 1);
  if (!haystack)
__builtin_abort();
  for (__SIZE_TYPE__ i = 0; i < n + 1; ++i)
haystack[i] = '0';
#endif
  needle = __builtin_malloc(n); 

If calc_n(259) returns (__SIZE_TYPE__)-1 (aka 18446744073709551615). n+1 would
be 0 which will is fine for malloc. and then the for is skipped if n+1 == 0 and
a jump threading happens so you get two copies of the second malloc and then
you get a malloc which has 18446744073709551615.

So the warning is correct (and code produced) in some sense of correctness.
Maybe the best thing is add an assume after the call to calc_n that it will be
small or smaller than the n or so.

[Bug c/105972] [12/13 Regression] ICE in lower_stmt, at gimple-low.cc:312 since r12-4608-gb4702276615ff8d4

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105972

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:23b4ce18379cd336d99d7c71701be28118905b57

commit r13-5112-g23b4ce18379cd336d99d7c71701be28118905b57
Author: Jakub Jelinek 
Date:   Wed Jan 11 22:18:42 2023 +0100

c: Don't emit DEBUG_BEGIN_STMTs for K function argument declarations
[PR105972]

K function parameter declarations are handled by calling
recursively c_parser_declaration_or_fndef in a loop, where each such
call will add_debug_begin_stmt at the start.
Now, if the K function definition is not a nested function,
building_stmt_list_p () is false and so we don't emit the DEBUG_BEGIN_STMTs
anywhere, but if it is a nested function, we emit it in the containing
function at the point of the nested function definition.
As the following testcase shows, it can cause ICEs if the containing
function has var-tracking disabled but nested function has them enabled,
as the DEBUG_BEGIN_STMTs are added to the containing function which
shouldn't have them but MAY_HAVE_DEBUG_MARKER_STMTS is checked already
for the nested function, or just wrong experience in the debugger.

The following patch ensures we don't emit any such DEBUG_BEGIN_STMTs for
the
K function parameter declarations even in nested functions.

2023-01-11  Jakub Jelinek  

PR c/105972
* c-parser.cc (c_parser_declaration_or_fndef): Disable debug
non-bind
markers for K function parameter declarations of nested
functions.

* gcc.dg/pr105972.c: New test.

[Bug tree-optimization/108377] Unexpected 'exceeds maximum object size' diagnostic, wrong-code?

2023-01-11 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108377

--- Comment #1 from Thomas Schwinge  ---
That's x86_64-pc-linux-gnu at today's commit
de99049f6fe5341024d4d939ac50d063280f90db.

[Bug modula2/108261] modula-2 module registration process seems to fail with shared libraries.

2023-01-11 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108261

--- Comment #19 from Iain Sandoe  ---
(In reply to Gaius Mulley from comment #18)
> For the runtime perspective then your layered approach is much cleaner.

indeed .. 

> It would be good to allow users to be able to use pim and some iso
> functionality or visa versa.

So we have:

1. some content that is common (unequivocally) = com

2. some content that must be local to iso (it conflicts with 3) = iso

3. some content that must be local to pim (it conflicts with 2) = pim

4. some iso content that could also be used by pim = ixt

5. some pim content that could also be used by iso = pxt

assuming we cannot combine ixt and pxt (because doing so would make it
impossible to have a 'strict' mode)

 we could avoid a layering violation thus:

fiso (strict) = iso + ixt +  com
fiso (non-strict) = iso + ixt + pxt + com

fpim* (strict?) pim + pxt + com
fpim* (non-strict) pim + pxt + ixt + com


= still seems overly complex (but workable) ..

Of course, if we can alter the mangling of the iso and pim content so that it
does not conflict .. 

.. then IMO we can just have one combined library and the FE should be
responsible for disallowing interfaces that are not permitted by the selected
dialect.

(i.e. if the code never refers to a disallowed symbol, then it does not matter
if the library contains it.)

[Bug tree-optimization/108377] New: Unexpected 'exceeds maximum object size' diagnostic, wrong-code?

2023-01-11 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108377

Bug ID: 108377
   Summary: Unexpected 'exceeds maximum object size' diagnostic,
wrong-code?
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tschwinge at gcc dot gnu.org
  Target Milestone: ---

Created attachment 54249
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54249=edit
1.c

Am I confused (it's late), or is GCC?  For '-O2' and higher:

1.c: In function ‘f’:
1.c:22:12: warning: argument 1 value ‘18446744073709551615’ exceeds maximum
object size 9223372036854775807 [-Walloc-size-larger-than=]
   22 |   needle = __builtin_malloc(n); /* { dg-bogus {exceeds maximum
object size} } */
  |^~~
1.c:22:12: note: in a call to built-in allocation function
‘__builtin_malloc’

Manually reduced from some other test case.

Same issue for actual 'malloc', and 'size_t'.

This supposedly bogus 'needle' diagnostic disappears if I disable the
'haystack' allocation of 'n + 1'.

Actually, is this wrong-code?

1.c.128t.sra:  _2 = __builtin_malloc (_1);
1.c.128t.sra:  _5 = __builtin_malloc (n_14);

1.c.129t.thread1:  _2 = __builtin_malloc (_1);
1.c.129t.thread1:  _10 = __builtin_malloc (n_14);
1.c.129t.thread1:  _5 = __builtin_malloc (n_14);

1.c.130t.dom2:  _2 = __builtin_malloc (_1);
1.c.130t.dom2:  _10 = __builtin_malloc (18446744073709551615);
1.c.130t.dom2:  _5 = __builtin_malloc (n_14);

[...]

1.c.194t.fre5:  _2 = __builtin_malloc (_1);
1.c.194t.fre5:  _10 = __builtin_malloc (18446744073709551615);
1.c.194t.fre5:  _5 = __builtin_malloc (n_14);

1.c.195t.thread2:  _2 = __builtin_malloc (_1);
1.c.195t.thread2:  _10 = __builtin_malloc (18446744073709551615);
1.c.195t.thread2:  _33 = __builtin_malloc (n_14);
1.c.195t.thread2:  _5 = __builtin_malloc (n_14);

1.c.196t.dom3:  _2 = __builtin_malloc (_1);
1.c.196t.dom3:  _10 = __builtin_malloc (18446744073709551615);
1.c.196t.dom3:  _33 = __builtin_malloc (i_51);
1.c.196t.dom3:  _5 = __builtin_malloc (0);

[...]

1.c.254t.optimized:  _2 = __builtin_malloc (_1);
1.c.254t.optimized:  _10 = __builtin_malloc (18446744073709551615);
1.c.254t.optimized:  _33 = __builtin_malloc (i_51);
1.c.254t.optimized:  _5 = __builtin_malloc (0);

[Bug tree-optimization/107838] spurious "may be used uninitialized" warning on variable initialized at the first iteration of a loop

2023-01-11 Thread guilherme.janczak at yandex dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107838

Guilherme Janczak  changed:

   What|Removed |Added

 CC||guilherme.janczak at yandex 
dot co
   ||m

--- Comment #3 from Guilherme Janczak  ---
I just ran into the same bug with the following code:

/* rand32: return 32 bits of randomness */
static
unsigned long rand32(void)
{
int rand(void);
unsigned long rval;

/*
 * There is modulo bias in this example if RAND_MAX isn't a power of 2
 * minus 1, but that is irrelevant for the bug report.
 *
 * C only guarantees RAND_MAX is at least 32767, that is, 15 bits.
 */
rval = rand() & 0x7FFF;
rval <<= 15;
rval |= rand() & 0x7FFF;
rval <<= 2;
rval |= rand() & 0x03;
return rval;

}

/* rand_hex: fill a buffer with random hex digits */
void rand_hex(unsigned char *buf, int len)
{
const char *hex = "0123456789ABCDEF";
int i;
unsigned long r;

for (i = 0; i < len; i++) {
/* If we don't have any random bits in r, get some more. */
if (i % 8 == 0)
r = rand32();

/* Use 4 bits from the 32-bit integer at a time. */
buf[i] = hex[r & 0x0F];
r >>= 4;
}
}

$ gcc -O -c -Wmaybe-uninitialized test.c
test.c: In function 'rand_hex':
test.c:37:19: warning: 'r' may be used uninitialized in this function
[-Wmaybe-uninitialized]
   37 | r >>= 4;
  | ~~^


Notice how the 1st usage of r doesn't cause the warning, but the 2nd one does.
I haven't tested GCC 13.0.0, I get this with the GCC 12.2.1 from Alpine Linux
and 11.2.0 from OpenBSD, here's their respective `gcc -v` outputs:
gcc version 12.2.1 20220924 (Alpine 12.2.1_git20220924-r4)
gcc version 11.2.0 (GCC)

[Bug middle-end/108376] TSVC s1279 runs 40% faster with aocc than gcc at zen4

2023-01-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov  ---
I think your GCC dumps are for the wrong loop.

[Bug tree-optimization/108334] Strange message in libgav1

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108334

Andrew Pinski  changed:

   What|Removed |Added

 Status|WAITING |UNCONFIRMED
 Ever confirmed|1   |0

[Bug fortran/108369] FM509 Fails to compile with error

2023-01-11 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108369

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2023-01-11
 Ever confirmed|0   |1

--- Comment #1 from anlauf at gcc dot gnu.org ---
(In reply to Ben Brewer from comment #0)
> The most critical of these behavioural changes is that the NIST F77 test
> suite fails on FM509.FOR with the following:
> "Error: Actual argument contains too few elements for dummy argument
> ‘c1d001’ (19/48) at (1)"

The error message refers to:

  SUBROUTINE SN512(C1D001,CVD001)
  CHARACTER C1D001(6)*8,CVD001*8
  CVD001 = C1D001(1)
  RETURN
  END

and the caller is:

  CALL SN512(C1N001(5)(2:9),CVCOMP)

which passes an array of size 1.

Workaround: either use -std=legacy or fix the above argument declaration to:

  CHARACTER C1D001(*)*8,CVD001*8

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

Eric Botcazou  changed:

   What|Removed |Added

 Resolution|--- |FIXED
   Target Milestone|--- |11.4
 Status|REOPENED|RESOLVED

--- Comment #20 from Eric Botcazou  ---
Fixed on mainline, 12 and 11 branches.

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

--- Comment #19 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Eric Botcazou
:

https://gcc.gnu.org/g:01e80c4c630a5a6286a521b047d0ef80631c892c

commit r11-10463-g01e80c4c630a5a6286a521b047d0ef80631c892c
Author: Eric Botcazou 
Date:   Wed Jan 11 15:58:47 2023 +0100

Fix problematic interaction between bitfields, unions, SSO and SRA

The handling of bitfields by the SRA pass is peculiar and this must be
taken
into account to support the scalar_storage_order attribute.

gcc/
PR tree-optimization/108199
* tree-sra.c (sra_modify_expr): Deal with reverse storage order
for bit-field references.

gcc/testsuite/
* gcc.dg/sso-17.c: New test.

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

--- Comment #18 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Eric Botcazou
:

https://gcc.gnu.org/g:eec3a65ed638a1c58fa08ddf508d2d60b64d311d

commit r12-9041-geec3a65ed638a1c58fa08ddf508d2d60b64d311d
Author: Eric Botcazou 
Date:   Wed Jan 11 15:58:47 2023 +0100

Fix problematic interaction between bitfields, unions, SSO and SRA

The handling of bitfields by the SRA pass is peculiar and this must be
taken
into account to support the scalar_storage_order attribute.

gcc/
PR tree-optimization/108199
* tree-sra.cc (sra_modify_expr): Deal with reverse storage order
for bit-field references.

gcc/testsuite/
* gcc.dg/sso-17.c: New test.

[Bug c/108375] [10/11/12/13 Regression] Some variably modified types not detected as such

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108375

--- Comment #3 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> https://gcc.gnu.org/pipermail/gcc-patches/2006-May/194375.html

I can't tell if the Ada testcase was added or not.

[Bug c/108375] [10/11/12/13 Regression] Some variably modified types not detected as such

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108375

--- Comment #2 from Andrew Pinski  ---
https://gcc.gnu.org/pipermail/gcc-patches/2006-May/194375.html

[Bug c/108375] [10/11/12/13 Regression] Some variably modified types not detected as such

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108375

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2023-01-11
 Ever confirmed|0   |1
   Target Milestone|--- |10.5
  Known to fail||4.4.7
  Known to work||4.1.2
 Status|UNCONFIRMED |NEW

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug preprocessor/108244] [13 Regression] `pragma GCC diagnostic` and -E -fdirectives-only causes the preprocessor to become confused since r13-1544-ge46f4d7430c52104

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108244

Andrew Pinski  changed:

   What|Removed |Added

 CC||thiago at kde dot org

--- Comment #10 from Andrew Pinski  ---
*** Bug 108372 has been marked as a duplicate of this bug. ***

[Bug preprocessor/108372] [12 regression] -E -fdirectives-only crash

2023-01-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108372

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
Exact dup of bug 108244.

*** This bug has been marked as a duplicate of bug 108244 ***

[Bug tree-optimization/99412] s352 benchmark of TSVC is vectorized by clang and not by gcc

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99412

--- Comment #2 from Jan Hubicka  ---
This is also seen with zen4 comparing gcc and aocc. (about 2.3 times
differnece)

[Bug tree-optimization/99408] s3251 benchmark of TSVC vectorized by clang runs about 7 times faster compared to gcc

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99408

--- Comment #3 from Jan Hubicka  ---
with zen4 gcc build loop takes 19s, while aocc 6.6.

aocc:

.LBB0_1:# %for.cond22.preheader
# =>This Loop Header: Depth=1
# Child Loop BB0_2 Depth 2
vbroadcastssa(%rip), %zmm20
xorl%ecx, %ecx
.p2align4, 0x90
.LBB0_2:# %vector.body
#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vmovups c(%rcx), %zmm13
vmovaps %zmm20, %zmm12
vmovups e(%rcx), %zmm0
vaddps  b(%rcx), %zmm13, %zmm20
vmulps  %zmm13, %zmm0, %zmm13
vmovaps %zmm20, %zmm15
vpermt2ps   %zmm12, %zmm29, %zmm15
vmovups %zmm20, a+4(%rcx)
vmovups %zmm13, b(%rcx)
vmulps  %zmm0, %zmm15, %zmm12
vmovups %zmm12, d(%rcx)
addq$64, %rcx
cmpq$127936, %rcx   # imm = 0x1F3C0
jne .LBB0_2
# %bb.3:# %middle.block

vextractf32x4   $3, %zmm20, %xmm5
vmovss  -4(%rsp), %xmm2 # 4-byte Reload
# xmm2 = mem[0],zero,zero,zero
vmovss  -12(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
incl%eax
vaddss  b+127936(%rip), %xmm2, %xmm2
vpermilps   $231, %xmm5, %xmm5  # xmm5 = xmm5[3,1,2,3]
vmulss  -8(%rsp), %xmm5, %xmm5  # 4-byte Folded Reload
vmovss  %xmm0, b+127936(%rip)
vmovss  -16(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vmovss  %xmm2, a+127940(%rip)
vmulss  -20(%rsp), %xmm2, %xmm2 # 4-byte Folded Reload
vmovss  %xmm5, d+127936(%rip)
vaddss  b+127940(%rip), %xmm0, %xmm5
vmovss  -24(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vmovss  %xmm0, b+127940(%rip)
vmovss  -28(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vmovss  %xmm2, d+127940(%rip)
vaddss  b+127944(%rip), %xmm0, %xmm2
vmovss  -36(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vmovss  %xmm5, a+127944(%rip)
vmulss  -32(%rsp), %xmm5, %xmm5 # 4-byte Folded Reload
vmovss  %xmm0, b+127944(%rip)
vmovss  -40(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vmovss  %xmm2, a+127948(%rip)
vmulss  %xmm22, %xmm2, %xmm2
vmovss  %xmm5, d+127944(%rip)
vaddss  b+127948(%rip), %xmm21, %xmm5
vmovss  %xmm2, d+127948(%rip)
vmovss  %xmm0, b+127948(%rip)
vmovss  -44(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vaddss  b+127952(%rip), %xmm24, %xmm2
vmovss  %xmm5, a+127952(%rip)
vmulss  %xmm25, %xmm5, %xmm5
vmovss  %xmm0, b+127952(%rip)
vmovss  -48(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vmovss  %xmm5, d+127952(%rip)
vaddss  b+127956(%rip), %xmm27, %xmm5
vmovss  %xmm2, a+127956(%rip)
vmulss  %xmm28, %xmm2, %xmm2
vmovss  %xmm2, d+127956(%rip)
vmovss  %xmm0, b+127956(%rip)
vmovss  -52(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vaddss  b+127960(%rip), %xmm30, %xmm2
vmovss  %xmm5, a+127960(%rip)
vmulss  %xmm31, %xmm5, %xmm5
vmovss  %xmm5, d+127960(%rip)
vmovss  %xmm0, b+127960(%rip)
vmovss  -56(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vaddss  b+127964(%rip), %xmm16, %xmm5
vmovss  %xmm2, a+127964(%rip)
vmulss  %xmm18, %xmm2, %xmm2
vmovss  %xmm2, d+127964(%rip)
vmovss  %xmm0, b+127964(%rip)
vmovss  -60(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vaddss  b+127968(%rip), %xmm19, %xmm2
vmovss  %xmm5, a+127968(%rip)
vmulss  %xmm1, %xmm5, %xmm5
vmovss  %xmm5, d+127968(%rip)
vmovss  %xmm0, b+127968(%rip)
vmovss  -64(%rsp), %xmm0# 4-byte Reload
# xmm0 = mem[0],zero,zero,zero
vaddss  b+127972(%rip), %xmm3, %xmm5
vmovss  

[Bug middle-end/108376] New: TSVC s1279 runs 40% faster with aocc than gcc at zen4

2023-01-11 Thread hubicka at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108376

Bug ID: 108376
   Summary: TSVC s1279 runs 40% faster with aocc than gcc at zen4
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hubicka at gcc dot gnu.org
  Target Milestone: ---

jh@alberti:~/tsvc/bin> more s1279.c
#include 
#include 

typedef float real_t;
#define iterations 100
#define LEN_1D 32000
#define LEN_2D 256
real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D];
real_t aa[LEN_2D][LEN_2D];
real_t bb[LEN_2D][LEN_2D];
real_t cc[LEN_2D][LEN_2D];
real_t qq;
int
main(void)
{
//reductions
//if to max reduction

real_t x;
int * __restrict__ ip = (int *) malloc(LEN_1D*sizeof(real_t));

for (int i = 0; i < LEN_1D; i = i+5){
(ip)[i]   = (i+4);
(ip)[i+1] = (i+2);
(ip)[i+2] = (i);
(ip)[i+3] = (i+3);
(ip)[i+4] = (i+1);
}
for (int nl = 0; nl < iterations; nl++) {
for (int i = 0; i < LEN_1D; i++) {
if (a[i] < (real_t)0.) {
if (b[i] > a[i]) {
c[i] += d[i] * e[i];
}
}
}
//dummy(a, b, c, d, e, aa, bb, cc, 0.);
}

return x;
}
jh@alberti:~/tsvc/bin> ~/trunk-install/bin/gcc -Ofast -march=native s1279.c   
jh@alberti:~/tsvc/bin> perf stat ./a.out

 Performance counter stats for './a.out':

   2762.85 msec task-clock:u  #0.999 CPUs utilized  
 0  context-switches:u#0.000 /sec   
 0  cpu-migrations:u  #0.000 /sec   
   265  page-faults:u #   95.915 /sec   
   10155904052  cycles:u  #3.676 GHz   
  (83.34%)
 20767  stalled-cycles-frontend:u #0.00% frontend cycles
idle (83.36%)
 36970  stalled-cycles-backend:u  #0.00% backend cycles
idle  (83.36%)
   27985795691  instructions:u#2.76  insn per cycle 
  #0.00  stalled cycles per
insn  (83.36%)
1999265642  branches:u#  723.624 M/sec 
  (83.36%)
502031  branch-misses:u   #0.03% of all branches   
  (83.23%)

   2.764553907 seconds time elapsed

   2.763249000 seconds user
   0.0 seconds sys


jh@alberti:~/tsvc/bin> ~/aocc-compiler-4.0.0/bin/clang -Ofast -march=native
s1279.c 
jh@alberti:~/tsvc/bin> perf stat ./a.out

 Performance counter stats for './a.out':

   1980.94 msec task-clock:u  #0.999 CPUs utilized  
 0  context-switches:u#0.000 /sec   
 0  cpu-migrations:u  #0.000 /sec   
77  page-faults:u #   38.871 /sec   
7261166980  cycles:u  #3.666 GHz   
  (83.25%)
 16796  stalled-cycles-frontend:u #0.00% frontend cycles
idle (83.25%)
 34506  stalled-cycles-backend:u  #0.00% backend cycles
idle  (83.25%)
   10498254812  instructions:u#1.45  insn per cycle 
  #0.00  stalled cycles per
insn  (83.40%)
1500160478  branches:u#  757.299 M/sec 
  (83.45%)
   1000905  branch-misses:u   #0.07% of all branches   
  (83.40%)

   1.982364055 seconds time elapsed

   1.98146 seconds user
   0.0 seconds sys


aocc does:
.LBB0_6:# %for.inc43.vec.bb
#   in Loop: Header=BB0_2 Depth=2
addq$256, %rcx  # imm = 0x100
cmpq$128000, %rcx   # imm = 0x1F400
je  .LBB0_7
.LBB0_2:# %vector.body
#   Parent Loop BB0_1 Depth=1
# =>  This Inner Loop Header: Depth=2
vmovups a(%rcx), %zmm1
vmovups a+64(%rcx), %zmm2
vmovups a+128(%rcx), %zmm3
vmovups a+192(%rcx), %zmm4
# implicit-def: $k4
vcmpltps%zmm0, %zmm1, %k0
vcmpltps%zmm0, %zmm2, %k1
vcmpltps%zmm0, %zmm3, %k2
vcmpltps%zmm0, %zmm4, %k3
kunpckwd%k0, %k1, %k0
kunpckwd%k2, %k3, %k1   
# implicit-def: $k2
# implicit-def: $k3
kunpckdq%k0, %k1, %k0   
   

[Bug c/108375] New: [10/11/12/13 Regression] Some variably modified types not detected as such

2023-01-11 Thread jsm28 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108375

Bug ID: 108375
   Summary: [10/11/12/13 Regression] Some variably modified types
not detected as such
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jsm28 at gcc dot gnu.org
  Target Milestone: ---

variably_modified_type_p fails to detect an array type as variably modified if
the array and its element type are of constant size but the element type is
variably modified. For example, the following code should be diagnosed as
invalid, but is not (similar rejects-valid or wrong-code examples could no
doubt be constructed as well).

void
f (int a)
{
  typedef int A[a];
  goto x;
  A *p[2];
  x : ;
}

This is a regression in 4.2 and later relative to older versions, I think
introduced by g:2e3b8fe7b5405a94d86bfa323c0e80e83c58d784 .

commit 2e3b8fe7b5405a94d86bfa323c0e80e83c58d784
Author: Eric Botcazou 
Date:   Wed May 17 13:11:09 2006 +

tree.c (variably_modified_type_p): Return true if the element type is
variably modified without recursing.

* tree.c (variably_modified_type_p) : Return true
if the element type is variably modified without recursing.

From-SVN: r113858

[Bug tree-optimization/108374] [12/13 Regression] unexpected -Wstringop-overflow when using std::atomic and std::shared_ptr

2023-01-11 Thread romain.geissler at amadeus dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108374

--- Comment #1 from Romain Geissler  ---
I forgot to mention: this happens on x86-64 with -O1.

[Bug tree-optimization/108374] New: [12/13 Regression] unexpected -Wstringop-overflow when using std::atomic and std::shared_ptr

2023-01-11 Thread romain.geissler at amadeus dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108374

Bug ID: 108374
   Summary: [12/13 Regression] unexpected -Wstringop-overflow when
using std::atomic and std::shared_ptr
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: romain.geissler at amadeus dot com
  Target Milestone: ---

Hi,

The following snippet produces an unexpected -Wstringop-overflow with gcc 12
and current trunk:

#include 
#include 

struct A: public std::enable_shared_from_this
{
std::atomic _attr;
};

void f(std::shared_ptr pointer)
{
std::weak_ptr weakPointer(pointer);

[[maybe_unused]] const unsigned int aAttr = weakPointer.lock()->_attr;
}


Compiler Explorer output:
In file included from
/opt/compiler-explorer/gcc-trunk-20230111/include/c++/13.0.0/atomic:41,
 from :1:
In member function 'std::__atomic_base<_IntTp>::__int_type
std::__atomic_base<_IntTp>::load(std::memory_order) const [with _ITp = long
unsigned int]',
inlined from 'std::__atomic_base<_IntTp>::operator __int_type() const [with
_ITp = long unsigned int]' at
/opt/compiler-explorer/gcc-trunk-20230111/include/c++/13.0.0/bits/atomic_base.h:365:20,
inlined from 'void f(std::shared_ptr)' at :13:69:
/opt/compiler-explorer/gcc-trunk-20230111/include/c++/13.0.0/bits/atomic_base.h:505:31:
error: 'long unsigned int __atomic_load_8(const volatile void*, int)' writing 8
bytes into a region of size 0 overflows the destination
[-Werror=stringop-overflow=]
  505 | return __atomic_load_n(&_M_i, int(__m));
  |~~~^
In function 'void f(std::shared_ptr)':
cc1plus: note: destination object is likely at address zero
cc1plus: some warnings being treated as errors
Compiler returned: 1


I have found bug #104475 which seems to also deal with atomics and
-Wstringop-overflow however I can't judge if it looks like a duplicate or a
different issue.

Cheers,
Romain

[Bug target/108293] Incorrect assembly emitted for float for BPF target

2023-01-11 Thread jemarch at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108293

Jose E. Marchesi  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Jose E. Marchesi  ---
Fixed.

[Bug tree-optimization/71343] missed optimization (can't "prove" shift and multiplication equivalence)

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71343

--- Comment #4 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:98837d6e79dd27c15f5218f3f1ddf838cda4796c

commit r13-5111-g98837d6e79dd27c15f5218f3f1ddf838cda4796c
Author: Roger Sayle 
Date:   Wed Jan 11 16:54:58 2023 +

PR tree-optimization/71343: Value number X<<2 as X*4.

This patch is the second part of a fix for PR tree-optimization/71343,
that implements Richard Biener's suggestion of using tree-ssa's value
numbering instead of match.pd.  The change is that when assigning a
value number for the expression X<

gcc/ChangeLog
PR tree-optimization/71343
* tree-ssa-sccvn.cc (visit_nary_op) : Make
the value number of the expression X << C the same as the value
number for the multiplication X * (1<

[Bug c/105180] [10/11/12/13 Regression] K style definition does not evaluate array size

2023-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org,
   ||jsm28 at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
int global = 0;

int
foo (void)
{
  ++global;
  return 1;
}

void
baz (void)
{
  void
  bar (s, c)
char *s;
char c[static foo ()];
  {
  }
  bar ("1", "1");
  bar ("1", "1");
  bar ("1", "1");
}

int
main ()
{
  baz ();
  if (global != 3)
__builtin_abort ();
  return 0;
}

shows that for nested functions those side-effects are emitted, but at a wrong
location.
The side-effects in that case are evaluated when passing through the definition
of bar inside of the baz function,
rather than when bar is called.  So above, foo () is called just once, not 3
times.
If standard C declarations are used:
int global = 0;

int
foo (void)
{
  return ++global;
}

void
bar (char *s, char c[static foo ()])
{
}

int
main ()
{
  bar ("1", "1");
  if (global != 1)
__builtin_abort ();
  return 0;
}
then it works properly, in that case the pending sizes are recorded by
c_parser_parms_list_declarator -> push_parm_decl -> grokdeclarator and queued
by get_parm_info called from c_parser_parms_list_declarator.
But in case of K argument declarations, those are done by:
  while (c_parser_next_token_is_not (parser, CPP_EOF)
 && c_parser_next_token_is_not (parser, CPP_OPEN_BRACE))
c_parser_declaration_or_fndef (parser, false, false, false,
   true, false);
in c_parser_declaration_or_fndef and in that case that nested
c_parser_declaration_or_fndef calls start_decl which after
calling grokdeclarator which collects the pending expressions just does:
  if (expr)
add_stmt (fold_convert (void_type_node, expr));
and so it is unclear where exactly it is pushed for the non-nested functions,
for nested ones at the current statement location (the definition of nested
function).
Bet we need to arrange for those side-effects to be instead remembered
somewhere and queued into pending_sizes later.

[Bug target/108293] Incorrect assembly emitted for float for BPF target

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108293

--- Comment #4 from CVS Commits  ---
The master branch has been updated by David Faust :

https://gcc.gnu.org/g:c7279270a2deda81eaeba37a87d721bee0ed6004

commit r13-5110-gc7279270a2deda81eaeba37a87d721bee0ed6004
Author: David Faust 
Date:   Tue Jan 10 10:53:12 2023 -0800

bpf: correct bpf_print_operand for floats [PR108293]

The existing logic in bpf_print_operand was only correct for integral
CONST_DOUBLEs, and emitted garbage for floating point modes. Fix it so
floating point mode operands are correctly handled.

PR target/108293

gcc/

* config/bpf/bpf.cc (bpf_print_operand): Correct handling for
floating point modes.

gcc/testsuite/

* gcc.target/bpf/double-1.c: New test.
* gcc.target/bpf/double-2.c: New test.
* gcc.target/bpf/float-1.c: New test.

[Bug modula2/108182] gm2 driver mishandles target and multilib options

2023-01-11 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108182

--- Comment #13 from Iain Sandoe  ---
Created attachment 54248
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54248=edit
Revised fix

This essentially makes Modula-2 build its include paths in the Front End (which
is how all the other compilers in GCC work too).

the huge advantages there are that the prefix and multilib info are all
available (as is the sysroot) with no changes needed to gcc/gcc.cc
.. + the prefix correctly follows relocation of the compiler.

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

Eric Botcazou  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|REOPENED

--- Comment #17 from Eric Botcazou  ---
Yes, that's indeed the plan.

[Bug fortran/107424] [13 Regression] ICE in gfc_trans_omp_do, at fortran/trans-openmp.cc:5397 - and wrong code - with non-rectangular loops

2023-01-11 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107424

--- Comment #3 from Tobias Burnus  ---
Commenting out the 'gcc_assert' of comment 0, it compiles and produces the
following dump.
I don't understand why there is a 'lastprivate' – and 'i' in the bounds are
wrong: for the first iteration, it is undefined and otherwise, it lags always
behind.

  #pragma omp simd lastprivate(count.0) collapse(2)
  for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
for (j = 1; j <= i; j = j + 1)
  {
i = count.0 * 2 + 1;
L.1:;
  }

And yet another variant:
   !$omp do simd collapse(2)
   do i = 1, 9, 2
  do j = 1, i, 2
i.e. both with non-unit strides. Then the result is still an ICE; commenting
the assert, the result is:

D.4265 = (i + 1) / 2;  // Ups! This should use 'count.1' and shall not be
hoisted!
#pragma omp for collapse(2)
  {
{
  #pragma omp simd lastprivate(count.1) lastprivate(count.0)
collapse(2)
  for (count.0 = 0; count.0 < 5; count.0 = count.0 + 1)
for (count.1 = 0; count.1 < D.4265; count.1 = count.1 + 1)
  {
i = count.0 * 2 + 1;
j = count.1 * 2 + 1;
L.1:;
  }

Here, COUNT is used in the inner loop - that would be also the option for the
stride==1 case, but as the expression needs to be in the condition already, it
might be better to have for inner stride == 1:
for (j = 1; j <= count.0 * 2 + 1; j = j + 1)
and for inner stride == 2:
for (j = 1; j <= (count.0 * 2 + 1 + 1) / 2; j = j + 1)

We probably need to check whether any of lb,ub,stride contains a parent loop
var.

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

Andreas Krebbel  changed:

   What|Removed |Added

Version|13.0|12.2.1

--- Comment #16 from Andreas Krebbel  ---
The testcase fails on GCC 12.2.1 as well. Should we apply it there as well
after giving it some time in mainline?

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread krebbel at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

Andreas Krebbel  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from Andreas Krebbel  ---
Your patch fixes the problem for me. Thanks for the quick fix!

[Bug fortran/107424] [13 Regression] ICE in gfc_trans_omp_do, at fortran/trans-openmp.cc:5397 - and wrong code - with non-rectangular loops

2023-01-11 Thread burnus at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107424

Tobias Burnus  changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org
   Keywords||ice-on-valid-code,
   ||wrong-code
Summary|[13 Regression] ICE in  |[13 Regression] ICE in
   |gfc_trans_omp_do, at|gfc_trans_omp_do, at
   |fortran/trans-openmp.cc:539 |fortran/trans-openmp.cc:539
   |7   |7 - and wrong code - with
   ||non-rectangular loops

--- Comment #2 from Tobias Burnus  ---
The following program does not ICE but it shows with -Wall:
  Warning: ‘i’ is used uninitialized
the code is as above (comment 0) except for:

   do i = 1, 9, 1   !  <<< only change: Stride == 1; ICE has stride == 2.
  do j = 1, i

Looking at the dump:

integer(kind=4) D.4264;

D.4264 = i;  // BAD: Uninitialized variable
#pragma omp for collapse(2)
  {
{
  #pragma omp simd collapse(2)
  for (i = 1; i <= 9; i = i + 1)
for (j = 1; j <= D.4264; j = j + 1)  // Wrong: use loop war 'i' not
'D.4264'!
  {
L.1:;
  }

--

Draft patch for this issue – but not for the original issue (the ICE):

--- a/gcc/fortran/trans-openmp.cc
+++ b/gcc/fortran/trans-openmp.cc
@@ -5224 +5224 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t
*pblock,
-  from = gfc_evaluate_now (se.expr, pblock);
+  from = DECL_P (se.expr) ? se.expr : gfc_evaluate_now (se.expr, pblock);
@@ -5229 +5229 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t
*pblock,
-  to = gfc_evaluate_now (se.expr, pblock);
+  to = DECL_P (se.expr) ? se.expr : gfc_evaluate_now (se.expr, pblock);
@@ -5234 +5234 @@ gfc_trans_omp_do (gfc_code *code, gfc_exec_op op, stmtblock_t
*pblock,
-  step = gfc_evaluate_now (se.expr, pblock);
+  step = DECL_P (se.expr) ? se.expr : gfc_evaluate_now (se.expr, pblock);

[Bug tree-optimization/108334] Strange message in libgav1

2023-01-11 Thread lukaszcz18 at wp dot pl via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108334

--- Comment #5 from Jamaika  ---
Created attachment 54247
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54247=edit
Added zip files

[Bug modula2/108261] modula-2 module registration process seems to fail with shared libraries.

2023-01-11 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108261

--- Comment #18 from Gaius Mulley  ---
For the runtime perspective then your layered approach is much cleaner.
It would be good to allow users to be able to use pim and some iso
functionality or visa versa.

[Bug modula2/108261] modula-2 module registration process seems to fail with shared libraries.

2023-01-11 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108261

--- Comment #17 from Gaius Mulley  ---
Yes I was coming to the same conclusion (re: name mangling).

If each library module had its mangled name component set via -flibname=pim or
-flibname=iso etc.  Then we have one universe of distinct named modules.

All ctors fire on startup registering themselves with M2RTS
(M2Dependents).

main calls M2RTS to traverse the import graph marking the used
modules.  (From the linking perspective it wouldn't matter if there
were two Storages with different API used by 3rd party libraries etc).

I'm liking the simplicity of the name mangling implementation.

[Bug tree-optimization/108199] Bitfields, unions and SRA and storage_order_attribute

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108199

--- Comment #14 from CVS Commits  ---
The master branch has been updated by Eric Botcazou :

https://gcc.gnu.org/g:3e1cba12a8d71e70235a9a9b8f1a237a561db3e7

commit r13-5109-g3e1cba12a8d71e70235a9a9b8f1a237a561db3e7
Author: Eric Botcazou 
Date:   Wed Jan 11 15:58:47 2023 +0100

Fix problematic interaction between bitfields, unions, SSO and SRA

The handling of bitfields by the SRA pass is peculiar and this must be
taken
into account to support the scalar_storage_order attribute.

gcc/
PR tree-optimization/108199
* tree-sra.cc (sra_modify_expr): Deal with reverse storage order
for bit-field references.

gcc/testsuite/
* gcc.dg/sso-17.c: New test.

[Bug modula2/108261] modula-2 module registration process seems to fail with shared libraries.

2023-01-11 Thread iains at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108261

--- Comment #16 from Iain Sandoe  ---
(In reply to Iain Sandoe from comment #13)
> (In reply to Iain Sandoe from comment #12)
> > (In reply to Gaius Mulley from comment #11)
> > > > when a module has the same name but a different interface are the
> > >   symbols distinct (i.e. mangled differently)?
> > > 
> > > no.  So, as you say, the ordering of the static link works fine.  I
> > > had assumed that dynamic libraries also adhered to a similar ordering.
> > > From what we are observing it seems that all the ctors fire but the
> > > API integrity is preserved due to library ordering?  (Or have I
> > > misunderstood dynamic linking?).  (Or worse this might be true on
> > > gnu/linux but not on other platforms?).
> > 
> > comment #6 seems to indicate possible issues on linux too? (or I
> > misunderstand)
> > 
> > To find out what's actually happening will mean digging through the init in
> > the debugger .. 
> 
> One additional thought, perhaps lazy binding could be responsible; usually
> Darwin will not bind symbols on load, but on first use (speeds up startup). 
> However there is an option to force bind-on-load (when I get a chance, will
> try that 

That did not resolve the problem.

Actually, to come back to the first conversation we had (about the
cross-linking issue) .. the underlying problem is a layering one.

Assuming that multiple symbols with the same name is not reliable in one
process...
... and we cannot (easily) rename one set

the simplest solution is:
  - define a libm2com.so (containing the modules common to iso and pim.

  - make each of libm2iso and libm2pim depend on libm2com.

 so we have
   if (iso)
 libs = m2iso,m2com
   else
 libs = m2pim,m2com

That means we can get rid of the sledgehammer of "undefined, dynamic_lookup"
and we have no run-time symbols clashes.

We have suggested this before in various discussions .. and I guess it is a
bunch of configure work .. but I am beginning to think it is going to be a lot
less work than trying to solve the unknown issues we have now.

(we can, of course, make the default fscaffold-static as a work-around) but
then scaffold-dynamic is essentially unusable still.)

[Bug tree-optimization/108355] [13 Regression] Dead Code Elimination Regression at -O2 since r13-2772-g9baee6181b4e42

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108355

--- Comment #2 from Richard Biener  ---
Basically get_ref_base_and_extent trusts TYPE_SIZE of the array operand of an
ARRAY_REF (it would also use range info on an SSA name index, but we'd expect
a singleton to be propagated there already).  In principle that would be
a general "folding" trick we can apply to more consistently handle this
situation across passes.  gimplification might be the "nicest" place to
handle this, OTOH inlining / CCP might turn VLAs into [1] as well.

[Bug libstdc++/108225] canadian compilation of gdb error for libstdc++'s std_mutex.h on x86_64-w64-mingw32 host

2023-01-11 Thread ebotcazou at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108225

--- Comment #26 from Eric Botcazou  ---
> For --enable-threads=posix the value of _WIN32_WINNT doesn't matter, so we
> don't want to disable _GLIBCXX_HAS_GTHREADS in that case. That's why we need
> to include , to find out if it's actually affected by _WIN32_WINNT.

OK, indeed I totally overlooked the other threading models...

> But IMHO we don't want to include  unconditionally in every single
> libstdc++ header (because they all include  which includes
> this os_defines.h header). So only do it if _WIN32_WINNT has been set to
> some ancient value, because otherwise the problem doesn't exist.

Fair enough, although it only drags stdlib.h and sys/timeb.h if you define
__GTHREAD_HIDE_WIN32API like libstdc++ does.

[Bug c/108370] gcc doesn't merge bitwise-AND if an explicit comparison against 0 is given

2023-01-11 Thread dhowells at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370

--- Comment #3 from dhowells at redhat dot com  ---
We don't want to do:

   return ((unsigned int) bio->bi_flags >> bit & 1) != 0;

if we can avoid it as "bit" is usually constant - though I'm guessing the
optimiser should handle that?

[Bug modula2/108373] New: Update 'contrib/gcc_update:files_and_dependencies' for Modula-2

2023-01-11 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108373

Bug ID: 108373
   Summary: Update 'contrib/gcc_update:files_and_dependencies' for
Modula-2
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: modula2
  Assignee: gaius at gcc dot gnu.org
  Reporter: tschwinge at gcc dot gnu.org
  Target Milestone: ---

Given the amount of generated Auto* files that it brought in, I suppose we need
to update 'contrib/gcc_update:files_and_dependencies' for Modula-2.

[Bug preprocessor/108372] New: [12 regression] -E -fdirectives-only crash

2023-01-11 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108372

Bug ID: 108372
   Summary: [12 regression] -E -fdirectives-only crash
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: preprocessor
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thiago at kde dot org
  Target Milestone: ---

Probably similar to many other bugs related to -E -fdirectives-only. This
option is used by icecc .

Test:
g++ -std=c++17 -include type_traits -E -xc++ /dev/null -o /dev/null
-fdirectives-only

Tested with:
gcc version 13.0.0 20230110 (experimental) (GCC)

Output:
In file included from :
/home/tjmaciei/dev/gcc/include/c++/13.0.0/type_traits:2948:25: error: missing
binary operator before token "("
 2948 |bool _Nothrow = noexcept(_S_conv<_Tp>(_S_get())),
  | ^
/home/tjmaciei/dev/gcc/include/c++/13.0.0/type_traits:3033:27: internal
compiler error: unspellable token PRAGMA_EOL
 3033 | ~__nonesuch() = delete;
  |   ^
0xc4c472 c_cpp_diagnostic(cpp_reader*, cpp_diagnostic_level,
cpp_warning_reason, rich_location*, char const*, __va_list_tag (*) [1])
/home/tjmaciei/src/gcc/gcc/c-family/c-common.cc:6694
0x229a914 cpp_diagnostic_at
/home/tjmaciei/src/gcc/libcpp/errors.cc:67
0x229a914 cpp_diagnostic
/home/tjmaciei/src/gcc/libcpp/errors.cc:82
0x229aa73 cpp_error(cpp_reader*, cpp_diagnostic_level, char const*, ...)
/home/tjmaciei/src/gcc/libcpp/errors.cc:96
0x22a58d3 cpp_spell_token(cpp_reader*, cpp_token const*, unsigned char*, bool)
/home/tjmaciei/src/gcc/libcpp/lex.cc:4426
0x22a663a cpp_token_as_text(cpp_reader*, cpp_token const*)
/home/tjmaciei/src/gcc/libcpp/lex.cc:4442
0x229e43c _cpp_parse_expr
/home/tjmaciei/src/gcc/libcpp/expr.cc:1389
0x2296981 do_if
/home/tjmaciei/src/gcc/libcpp/directives.cc:2076
0x2298b68 _cpp_handle_directive
/home/tjmaciei/src/gcc/libcpp/directives.cc:572
0x22a6e7d cpp_directive_only_process(cpp_reader*, void*, void (*)(cpp_reader*,
CPP_DO_task, void*, ...))
/home/tjmaciei/src/gcc/libcpp/lex.cc:5272
0xc76faf scan_translation_unit_directives_only
/home/tjmaciei/src/gcc/gcc/c-family/c-ppoutput.cc:431
0xc76faf preprocess_file(cpp_reader*)
/home/tjmaciei/src/gcc/gcc/c-family/c-ppoutput.cc:104
0xc750b8 c_common_init()
/home/tjmaciei/src/gcc/gcc/c-family/c-opts.cc:1227
0xa6d0ce cxx_init()
/home/tjmaciei/src/gcc/gcc/cp/lex.cc:338
0x95d1c1 lang_dependent_init
/home/tjmaciei/src/gcc/gcc/toplev.cc:1815
0x95d1c1 do_compile
/home/tjmaciei/src/gcc/gcc/toplev.cc:2110
Please submit a full bug report, with preprocessed source.
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/108240] [13 Regression] ICE in emit_library_call_value_1 at gcc/calls.cc:4181 since r13-4894-gacc727cf02a144

2023-01-11 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108240

--- Comment #8 from Kewen Lin  ---
(In reply to Segher Boessenkool from comment #7)
> -m64 requires 64-bit instructions.  We will ICE if we try to generate code
> for -m64 without support for 64-bit insns enabled in the compiler.  For
> example, the stdu insn is required to implement the ABI sanely.
> 

The current behavior for one explicit command line option -m64 doesn't violate
the comment, the explicitly given -m64 will enable powerpc64 all the time, it
makes -m64 compilation always have 64-bit insns enabled. It's the same for both
cases before and after r13-4894.

> If the user said they want a -mcpu= for a CPU that has no 64-bit insns,
> but also wants to use -m64, we should just say sorry, that won't fly.

I agree that this is a sensible thing to look into and make. But to change the
behavior like this fully (on Linux, aix and darwin, 64 bit env w/ or w/o
explicit -m64) is a big adjustment comparing with the previous behaviors.

Since for the case that "the explicit option -m64 + cpu without 64-bit insn +
Linux/aix/darwin" it doesn't emit errors before, for the cases that "no
explicit option -m64 + cpu without 64-bit insn + aix/darwin" it only emits
warnings before. Only for the case "no explicit option -m64 + cpu without
64-bit insn + Linux", it emits error before r13-4894. After the culprit commit
it changes to not emit errors, this part is a regression, the proposed patch
can fix it.
But for the others in which cases we don't emit error before (for both cases
before and after r13-4894), to make them to emit errors is new behavior, it
could cost non-trivial efforts (at least on testing and some fixing on possible
fallouts).

[Bug target/108371] gcc for x86_64 may sign/zero extent arguments unnecessarily

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-01-11
   Keywords||missed-optimization
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Confirmed.  This is the C frontend doing

;; Function bio_release_pages (null)
;; enabled by -tree-original


{
  __bio_release_pages ((int) mark_dirty);
}

aka targetm.calls.promote_prototypes

[Bug c/105972] [12/13 Regression] ICE in lower_stmt, at gimple-low.cc:312 since r12-4608-gb4702276615ff8d4

2023-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105972

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Created attachment 54246
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54246=edit
gcc13-pr105972.patch

Untested fix.

[Bug c/108370] gcc doesn't merge bitwise-AND if an explicit comparison against 0 is given

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370

--- Comment #2 from Richard Biener  ---
ifcombine seems to assume that

 D.1987_7 = op0 & 1;
 if (D.1987_7 != 0)

is canonical but we see

  _9 = (_Bool) _6;
  if (_9 != 0)

instead.  That's already the form introduced by inlining from

_Bool bio_flagged (struct bio * bio, unsigned int bit)
{
  short unsigned int _1;
  unsigned int _2;
  unsigned int _3;
  unsigned int _4;
  _Bool _8;

   :
  _1 = bio_6(D)->bi_flags;
  _2 = (unsigned int) _1;
  _3 = _2 >> bit_7(D);
  _4 = _3 & 1;
  _8 = _4 != 0;
  return _8;

and

   :
  _1 = bio_flagged (bio_7(D), 0);
  if (_1 != 0)
goto ; [INV]
  else
goto ; [INV]

[Bug c/105972] [12/13 Regression] ICE in lower_stmt, at gimple-low.cc:312 since r12-4608-gb4702276615ff8d4

2023-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105972

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org
   Priority|P4  |P2

--- Comment #5 from Jakub Jelinek  ---
This ICEs also on the valid
__attribute__((optimize(0))) int
foo ()
{
  int
  bar (x)
int x;
  {
return x;
  }
  return bar (0);
}
with -O2 -g.
If I comment out the optimize attribute, then with -O2 -g
-fdump-tree-gimple-lineno
one can see:
  [pr105972-2.c:5:3] # DEBUG BEGIN_STMT
  [pr105972-2.c:7:5] # DEBUG BEGIN_STMT
  [pr105972-2.c:11:3] # DEBUG BEGIN_STMT
statements in foo and
  [pr105972-2.c:9:5] # DEBUG BEGIN_STMT
in bar.  The line 7 statement is just incorrect, IMHO shouldn't be added at
all, when parsing the K parameter declarations, there is no reasonable code
point to emit those.
For normal K functions, they aren't just emitted because

[Bug libstdc++/103755] {has,use}_facet() and iostream constructor performance

2023-01-11 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103755

--- Comment #16 from Jonathan Wakely  ---
Unfortunately this change causes a regression for libs that were statically
linked to libstdc++.a before the PR 91057 fix. Any object which has the buggy
std::locale::id::_M_id() code linked into it can get corrupted
locale::_Impl::_M_facet arrays, where the facets are at the wrong indices.

Before the introduction of __try_use_facet those corrupted _M_facet arrays
would result in a failed dynamic_cast and so has_facet would be false and
use_facet would throw. With the new code in GCC 13 the static_cast succeeds,
but with undefined behaviour.

So to avoid a regression from detecting the bug and throwing an exception to
crashing with a segfault, I think we need to change __try_use_facet to use
dynamic_cast, unfortunately.

We will still retain the use of __try_use_facet in
std::basic_ios::_M_cache_locale, so we'll still only do three dynamic_casts not
six, so that's still a bit better than it was before.

[Bug target/108371] New: gcc for x86_64 may sign/zero extent arguments unnecessarily

2023-01-11 Thread dhowells at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371

Bug ID: 108371
   Summary: gcc for x86_64 may sign/zero extent arguments
unnecessarily
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dhowells at redhat dot com
  Target Milestone: ---

When compiling for x86_64, bool, char and short arguments that are passed
directly to an argument of exactly the same type on another function with no
modification, e.g.:

   void __bio_release_pages(char mark_dirty);
   void bio_release_pages(char mark_dirty)
   {
__bio_release_pages(mark_dirty);
   }

get sign/zero-extended unnecessarily.  In the case of the above code, it
compiles to:

   0:   40 0f be ff movsbl %dil,%edi
   4:   e9 00 00 00 00  jmp9 

with "gcc -Os -c test.c".  Can the extension be optimised away?  Granted, the
upper bits bits of RDI could contain rubbish on entry to bio_release_pages(),
so sanitisation is not unreasonable - but on the other hand,
__bio_release_pages() would surely have to assume the same and do the same
sanitisation?

The toolchain used is the Fedora 37 system compiler:

gcc-12.2.1-4.fc37.x86_64
binutils-2.38-25.fc37.x86_64

[Bug c/108370] gcc doesn't merge bitwise-AND if an explicit comparison against 0 is given

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2023-01-11
 Ever confirmed|0   |1
   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW

--- Comment #1 from Richard Biener  ---
The good case is handled by ifcombine:

optimizing bits or bits test to _5 & T != 0
with temporary T = 2 | 1
Merging blocks 2 and 3

   [local count: 1073741824]:
  _5 = bio_4(D)->bi_flags;
  _8 = _5 & 1;
  if (_8 != 0)
goto ; [33.00%]
  else
goto ; [67.00%]

   [local count: 719407025]:
  _9 = _5 & 2;
  if (_9 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 714038313]:
  _1 = (int) mark_dirty_6(D);
  __bio_release_pages (bio_4(D), _1);

but the bad case is not handled:

   [local count: 1073741824]:
  _6 = bio_4(D)->bi_flags;
  _5 = (unsigned int) _6;
  _9 = (_Bool) _6;
  if (_9 != 0)
goto ; [33.00%]
  else
goto ; [67.00%]

   [local count: 719407025]:
  _10 = _5 >> 1;
  _11 = (_Bool) _10;
  if (_11 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [local count: 714038313]:
  _1 = (int) mark_dirty_7(D);
  __bio_release_pages (bio_4(D), _1);

it looks like some premature optimization triggered (not) there.

.original is (good)

;; Function bio_flagged (null)
;; enabled by -tree-original


{
  return ((unsigned int) bio->bi_flags & 1 << bit) != 0;
}

vs (bad)

;; Function bio_flagged (null)
;; enabled by -tree-original


{
  return ((unsigned int) bio->bi_flags >> bit & 1) != 0;
}

I suppose one variant is folded after the promotion to bool and one
before only.

[Bug fortran/108349] LTO mismatch for __builtin_realloc between glibc and gfortran frontend

2023-01-11 Thread tschwinge at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108349

Thomas Schwinge  changed:

   What|Removed |Added

 CC||burnus at gcc dot gnu.org,
   ||tschwinge at gcc dot gnu.org,
   ||vries at gcc dot gnu.org

--- Comment #6 from Thomas Schwinge  ---
As nvptx target is known to be sensitive to such mismatches (outside of the LTO
context reported here), I individually did test this commit
r13-5100-g0986c351aa8a9f08b3cb614baec13564dd62c114 "fortran: Fix up function
types for realloc and sincos{,f,l} builtins [PR108349]", and found that it also
resolves the following nvptx target compilation failures:

'gfortran.dg/pr35662.f90':

ptxas /tmp/ccYNgEEN.o, line 44; error   : Illegal operand type to
instruction 'st'
ptxas /tmp/ccYNgEEN.o, line 51; error   : Type of argument does not match
formal parameter '%in_ar0'
ptxas /tmp/ccYNgEEN.o, line 51; error   : Alignment of argument does not
match formal parameter '%in_ar0'
ptxas /tmp/ccYNgEEN.o, line 44; error   : Unknown symbol '%stack'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status

'gfortran.fortran-torture/compile/pr37236.f':

ptxas [...]/build-gcc/gcc/testsuite/gfortran/pr37236.o, line 269; error   :
Illegal operand type to instruction 'st'
ptxas [...]/build-gcc/gcc/testsuite/gfortran/pr37236.o, line 275; error   :
Type of argument does not match formal parameter '%in_ar0'
ptxas [...]/build-gcc/gcc/testsuite/gfortran/pr37236.o, line 269; error   :
Unknown symbol '%stack'
ptxas fatal   : Ptx assembly aborted due to errors
nvptx-as: ptxas returned 255 exit status

These are now all-PASS.


In the nvptx target gfortran test suite logs remain however dozens more similar
instances.  I've not checked if what's underlying those would also be exposing
the same kind of LTO problem.

[Bug tree-optimization/108368] [13 Regression] Dead Code Elimination Regression at -O3 since r13-1759-gdbb093f4f15

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108368

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|--- |13.0

--- Comment #1 from Richard Biener  ---
range_from_dom looks expensive as well and the bb vs. prev_bb in walking
immediate dominators is quite confusing (not to speak about resolve_dom ...).

[Bug c/108370] New: gcc doesn't merge bitwise-AND if an explicit comparison against 0 is given

2023-01-11 Thread dhowells at redhat dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370

Bug ID: 108370
   Summary: gcc doesn't merge bitwise-AND if an explicit
comparison against 0 is given
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dhowells at redhat dot com
  Target Milestone: ---

Created attachment 54245
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54245=edit
Demo code

If gcc sees a couple of calls to an inline function that does a bitwise-AND and
returns whether the result was zero or non-zero (e.g. a flag check helper), gcc
cannot merge them if the result of the AND is explicitly compared against 0,
even if the function's return type is a bool (which would do that anyway).  For
example:

   static inline bool bio_flagged(struct bio *bio, unsigned int bit)
   {
return (bio->bi_flags & (1U << bit)) != 0;
   }

   void bio_release_pages(struct bio *bio, bool mark_dirty)
   {
if (bio_flagged(bio, BIO_PAGE_REFFED) ||
bio_flagged(bio, BIO_PAGE_PINNED))
__bio_release_pages(bio, mark_dirty);
   }

compiles bio_release_pages() to:

   0:   66 8b 07mov(%rdi),%ax
   3:   a8 01   test   $0x1,%al
   5:   75 04   jneb 
   7:   a8 02   test   $0x2,%al
   9:   74 09   je 14 
   b:   40 0f b6 f6 movzbl %sil,%esi
   f:   e9 00 00 00 00  jmp14 
  14:   c3  ret

but:

   static inline bool bio_flagged(struct bio *bio, unsigned int bit)
   {
return bio->bi_flags & (1U << bit);
   }

gives:

   0:   f6 07 03testb  $0x3,(%rdi)
   3:   74 09   je e 
   5:   40 0f b6 f6 movzbl %sil,%esi
   9:   e9 00 00 00 00  jmpe 
   e:   c3  ret

Possibly the comparison against 0 could be optimised away.

I've attached some demo code that can be compiled with one of:

gcc -Os -c gcc-bool-demo.c
gcc -Os -c gcc-bool-demo.c -Dfix

The gcc I used above is the Fedora 37 system compiler:

gcc-12.2.1-4.fc37.x86_64
binutils-2.38-25.fc37.x86_64

but similar results can be seen with the Fedora arm cross-compiler:

   0:   e1d030b0ldrhr3, [r0]
   4:   e3130001tst r3, #1
   8:   1a01bne 14 
   c:   e3130002tst r3, #2
  10:   012fff1ebxeqlr
  14:   eafeb   0 <__bio_release_pages>

vs

   0:   e1d030b0ldrhr3, [r0]
   4:   e3130003tst r3, #3
   8:   012fff1ebxeqlr
   c:   eafeb   0 <__bio_release_pages>

[Bug fortran/108369] New: FM509 Fails to compile with error

2023-01-11 Thread ben.brewer at codethink dot co.uk via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108369

Bug ID: 108369
   Summary: FM509 Fails to compile with error
   Product: gcc
   Version: 11.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: fortran
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ben.brewer at codethink dot co.uk
  Target Milestone: ---

7 years ago I developed a legacy fortran transpiler, to test that it is
behaviourally correct the test suite runs gfortran over the sources, and then
runs gfortran over the transpiled sources and compares output.

Running this project recently showed numerous behavioural changes/errors and if
this is of interest then the project can be found here:
https://github.com/CodethinkLabs/ofc

The test suite I run against (mostly legacy) can be found here:
https://github.com/CodethinkLabs/ofc-tests

The most critical of these behavioural changes is that the NIST F77 test suite
fails on FM509.FOR with the following:
"Error: Actual argument contains too few elements for dummy argument ‘c1d001’
(19/48) at (1)"

[Bug tree-optimization/108368] New: [13 Regression] Dead Code Elimination Regression at -O3 since r13-1759-gdbb093f4f15

2023-01-11 Thread yann at ywg dot ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108368

Bug ID: 108368
   Summary: [13 Regression] Dead Code Elimination Regression at
-O3 since r13-1759-gdbb093f4f15
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: yann at ywg dot ch
  Target Milestone: ---

Created attachment 54244
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54244=edit
Case file

cat case.c #1409
int a;
int b;
short c;
static int d = 4;
void foo();
void e(int f) {}
int main() {
  unsigned g = -27;
  for (; g > 7; ++g) {
e(c && g);
b ? 0 : a;
if (g) {
  if (0 >= d)
foo();
} else
  d = 0;
  }
}

`gcc-f99d7d669eaa2830eb5878df4da67e77ec791522 (trunk) -O3` can not eliminate
`foo` but `gcc-releases/gcc-12.2.0 -O3` can.

`gcc-f99d7d669eaa2830eb5878df4da67e77ec791522 (trunk) -O3 -S -o /dev/stdout
case.c`
- OUTPUT -
main:
.LFB1:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movl$27, %ebx
.L5:
movld(%rip), %eax
testl   %eax, %eax
jle .L8
.L4:
xorl%eax, %eax
popq%rbx
.cfi_remember_state
.cfi_def_cfa_offset 8
ret
.p2align 4,,10
.p2align 3
.L8:
.cfi_restore_state
xorl%eax, %eax
callfoo
subl$1, %ebx
jne .L5
jmp .L4
-- END OUTPUT -


`gcc-releases/gcc-12.2.0 -O3 -S -o /dev/stdout case.c`
- OUTPUT -
main:
.LFB1:
.cfi_startproc
xorl%eax, %eax
ret
-- END OUTPUT -


Bisects to: r13-1759-gdbb093f4f15

commit dbb093f4f15ea66f2ce5cd2dc1903a6894563356
Author: Andrew MacLeod 
Date:   Mon Jul 18 15:04:23 2022 -0400

Resolve complicated join nodes in range_from_dom.

Join nodes which carry outgoing ranges on incoming edges are uncommon,
but can still be resolved by setting the dominator range, and then
calculating incoming edges.  Avoid doing so if one of the incoing edges
is not dominated by the same dominator.

* gimple-range-cache.cc (ranger_cache::range_from_dom): Check
  for incoming ranges on join nodes and add to worklist.

[Bug target/105554] [10/11/12/13 Regression] ICE: in emit_block_move_hints, at expr.cc:1829 since r9-5509-g5928bc2ec06dd4e7

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105554

--- Comment #8 from Richard Biener  ---
(In reply to Richard Biener from comment #7)
> (In reply to Martin Liška from comment #6)
> > We fail in the param assignment:
> > 
> > (gdb) pp x
> > (reg:V4DI 82)
> > (gdb) pp y
> > (mem/c:BLK (reg/f:DI 76 virtual-incoming-args) [1 x+0 S32 A256])
> > 
> > So we will likely need something similar to what we have in tree-inline.cc:
> > 
> >   5928/* For vector typed decls make sure to update DECL_MODE according
> >   5929   to the new function context.  */
> >   5930if (VECTOR_TYPE_P (TREE_TYPE (copy)))
> >   5931  SET_DECL_MODE (copy, TYPE_MODE (TREE_TYPE (copy)));
> > 
> > @Richi: Do you have a clue where to adjust it?
> 
> I think it goes wrong in use_register_for_decl (called from
> assign_parm_setup_block).
> 
> diff --git a/gcc/function.cc b/gcc/function.cc
> index d975b001ec9..b54f9b33c6a 100644
> --- a/gcc/function.cc
> +++ b/gcc/function.cc
> @@ -2229,7 +2229,9 @@ use_register_for_decl (const_tree decl)
>  }
>  
>/* Only register-like things go in registers.  */
> -  if (DECL_MODE (decl) == BLKmode)
> +  if (DECL_MODE (decl) == BLKmode
> +  || (VECTOR_TYPE_P (TREE_TYPE (decl))
> + && TYPE_MODE (TREE_TYPE (decl)) == BLKmode))
>  return false;
>  
>/* If -ffloat-store specified, don't put explicit float variables
> 
> fixes the ICE, not sure if we should adjust the PARM_DECLs mode somewhere
> in target cloning instead though?

Like in copy_arguments_nochange?

[Bug target/105554] [10/11/12/13 Regression] ICE: in emit_block_move_hints, at expr.cc:1829 since r9-5509-g5928bc2ec06dd4e7

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105554

--- Comment #7 from Richard Biener  ---
(In reply to Martin Liška from comment #6)
> We fail in the param assignment:
> 
> (gdb) pp x
> (reg:V4DI 82)
> (gdb) pp y
> (mem/c:BLK (reg/f:DI 76 virtual-incoming-args) [1 x+0 S32 A256])
> 
> So we will likely need something similar to what we have in tree-inline.cc:
> 
>   5928/* For vector typed decls make sure to update DECL_MODE according
>   5929   to the new function context.  */
>   5930if (VECTOR_TYPE_P (TREE_TYPE (copy)))
>   5931  SET_DECL_MODE (copy, TYPE_MODE (TREE_TYPE (copy)));
> 
> @Richi: Do you have a clue where to adjust it?

I think it goes wrong in use_register_for_decl (called from
assign_parm_setup_block).

diff --git a/gcc/function.cc b/gcc/function.cc
index d975b001ec9..b54f9b33c6a 100644
--- a/gcc/function.cc
+++ b/gcc/function.cc
@@ -2229,7 +2229,9 @@ use_register_for_decl (const_tree decl)
 }

   /* Only register-like things go in registers.  */
-  if (DECL_MODE (decl) == BLKmode)
+  if (DECL_MODE (decl) == BLKmode
+  || (VECTOR_TYPE_P (TREE_TYPE (decl))
+ && TYPE_MODE (TREE_TYPE (decl)) == BLKmode))
 return false;

   /* If -ffloat-store specified, don't put explicit float variables

fixes the ICE, not sure if we should adjust the PARM_DECLs mode somewhere
in target cloning instead though?

[Bug tree-optimization/108367] [12/13 Regression] ICE: verify_ssa failed (error: definition in block 4 does not dominate use in block 3) since r12-5138-ge82c382971664d6f

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108367

--- Comment #2 from Richard Biener  ---
The OACC parloops pass adds a loop-around edge which short-cuts the definition
of the debug use.  I don't see anywhere how that handles debug stmts - maybe it
gets fixed during lowering ... (yes, with -fno-checking it looks OK after
ompexpssa1).

The offending rev. is most certainly not the cause of the actual issue.

[Bug middle-end/105126] [10/11/12/13 Regression] Optimization regression gcc inserts not needed movsx when using switch statement

2023-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105126

Martin Liška  changed:

   What|Removed |Added

   Assignee|marxin at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org
 Status|ASSIGNED|NEW

[Bug tree-optimization/108366] [12/13 Regression] Spurious stringop overflow, possibly alias-related since r12-145-gd1d01a66012a93cc

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108366

Richard Biener  changed:

   What|Removed |Added

   Keywords||missed-optimization
   Last reconfirmed||2023-01-11
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #4 from Richard Biener  ---
Warns from

#1  0x013bc420 in warn_for_access (loc=2147485003, 
func=, exp=, 
opt=761, range=0x7fffd560, size=, 
write=true, read=false, maybe=false)
at /home/rguenther/src/gcc-12-branch/gcc/gimple-ssa-warn-access.cc:995
(gdb) l
990 }
991
992   if (write)
993 {
994   if (tree_int_cst_equal (range[0], range[1]))
995 warned = (func
996   ? warning_n (loc, opt, tree_to_uhwi (range[0]),
997(maybe
998 ? G_("%qD may write %E byte into a
region "
999  "of size %E")
(gdb) p debug_gimple_stmt (exp)
# .MEM_2 = VDEF <.MEM_23>
memset (  [(void *)], 65, 128);

on a path where actual.m_outline == nullptr

for some unknown reason we reload actual.m_outline in the loop, likely
because storing to it is thought to clobber actual.m_outline
(which is initialized from a new expression).  Note 'actual' escapes
the function via the printf call and 'new' can inspect/clobber globals.

We're also "bad" in computing points-to info because of the

memset(buffer.data(), 'A', new_size);

which with

char* data() {
if (m_outline)
return m_outline;
return reinterpret_cast(m_inline);
}

simply clobbers the whole object (with our points-to analysis).

Helping the compiler and doing

auto *b = buffer.m_outline;
for (unsigned i = 0; i < 128; ++i)
b[i] = 0;

allows it to optimize and avoid the diagnostic.  Using buffer.m_outline
in the memset instead of buffer.data () would probably work as well.

[Bug target/105554] [10/11/12/13 Regression] ICE: in emit_block_move_hints, at expr.cc:1829 since r9-5509-g5928bc2ec06dd4e7

2023-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105554

Martin Liška  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #6 from Martin Liška  ---
We fail in the param assignment:

(gdb) pp x
(reg:V4DI 82)
(gdb) pp y
(mem/c:BLK (reg/f:DI 76 virtual-incoming-args) [1 x+0 S32 A256])

So we will likely need something similar to what we have in tree-inline.cc:

  5928/* For vector typed decls make sure to update DECL_MODE according
  5929   to the new function context.  */
  5930if (VECTOR_TYPE_P (TREE_TYPE (copy)))
  5931  SET_DECL_MODE (copy, TYPE_MODE (TREE_TYPE (copy)));

@Richi: Do you have a clue where to adjust it?

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #20 from Richard Biener  ---
Fixed.

[Bug tree-optimization/107767] [13 Regression] switch to table conversion happening even though using btq is better

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107767

--- Comment #19 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:f99d7d669eaa2830eb5878df4da67e77ec791522

commit r13-5106-gf99d7d669eaa2830eb5878df4da67e77ec791522
Author: Richard Biener 
Date:   Mon Jan 9 09:42:22 2023 +0100

tree-optimization/107767 - not profitable switch conversion

When the CFG has not merged equal PHI defs in a switch stmt the
cost model from switch conversion gets off and we prefer a
jump table over branches.  The following fixes that by recording
cases that will be merged later and more appropriately counting
unique values.

PR tree-optimization/107767
* tree-cfgcleanup.cc (phi_alternatives_equal): Export.
* tree-cfgcleanup.h (phi_alternatives_equal): Declare.
* tree-switch-conversion.cc (switch_conversion::collect):
Count unique non-default targets accounting for later
merging opportunities.

* gcc.dg/tree-ssa/pr107767.c: New testcase.

[Bug c++/108365] [9/10/11/12/13 Regression] Wrong code with -O0

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108365

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug target/108308] [13 Regression] wrong code at -Os and -O2 with "-fno-tree-ccp" on x86_64-linux-gnu

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108308

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:b39c1bea5bae9aee1df25cab1064f983b9ec6941

commit r13-5105-gb39c1bea5bae9aee1df25cab1064f983b9ec6941
Author: Jakub Jelinek 
Date:   Wed Jan 11 13:06:14 2023 +0100

testsuite: Enable pr108308.c test on all int32 targets [PR108308]

This test seems to rely on 32-bit int (and uses a wider constant
which shouldn't fit into int), I've initially enabled it on ilp32+lp64
target, but apparently it works on llp64 too, so I've changed it to
int32.

2023-01-11  Jakub Jelinek  

PR target/108308
* gcc.dg/pr108308.c: Use int32 target rather than { ilp32 || lp64
}.

[Bug middle-end/107976] ICE: SIGSEGV (stack overflow) in emit_case_dispatch_table (stmt.cc:783) with large --param=jump-table-max-growth-ratio-for-speed

2023-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107976

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #6 from Martin Liška  ---
Fixed, not planning to do a backport.

[Bug tree-optimization/108352] [13 Regression] Dead Code Elimination Regression at -O2 since r13-1960-gd86d81a449c036

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108352

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Richard Biener  ---
Fixed.

[Bug middle-end/107976] ICE: SIGSEGV (stack overflow) in emit_case_dispatch_table (stmt.cc:783) with large --param=jump-table-max-growth-ratio-for-speed

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107976

--- Comment #5 from CVS Commits  ---
The master branch has been updated by Martin Liska :

https://gcc.gnu.org/g:8221efae233e2d5992a79600071dd0a52f1b3c74

commit r13-5104-g8221efae233e2d5992a79600071dd0a52f1b3c74
Author: Martin Liska 
Date:   Wed Dec 28 09:11:40 2022 +0100

switch expansion: limit JT growth param values

Currently, one can request a huge jump table creation which
leads to a non-sensual huge output. Moreover, use auto_vec rather
than a stack-allocated array.

PR middle-end/107976

gcc/ChangeLog:

* params.opt: Limit JT params.
* stmt.cc (emit_case_dispatch_table): Use auto_vec.

[Bug c/105972] [12/13 Regression] ICE in lower_stmt, at gimple-low.cc:312 since r12-4608-gb4702276615ff8d4

2023-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105972

Martin Liška  changed:

   What|Removed |Added

 Status|ASSIGNED|NEW
   Assignee|marxin at gcc dot gnu.org  |unassigned at gcc dot 
gnu.org

--- Comment #4 from Martin Liška  ---
(In reply to Richard Biener from comment #1)
> Confirmed.  Somehow we get in .original
> 
> ;; Function f (null)
> ;; enabled by -tree-original
> 
> 
> {
>   static int g ();
> 
>   # DEBUG BEGIN STMT;
> static int g ();
> }

Hm, here we end up with a nested function whose parsing is probably skipped and
we end up with the wrong options.
Dunno why.

[Bug tree-optimization/108352] [13 Regression] Dead Code Elimination Regression at -O2 since r13-1960-gd86d81a449c036

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108352

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:7c9f20fcfdc2d8453df88ceb7e693debfcd678c0

commit r13-5103-g7c9f20fcfdc2d8453df88ceb7e693debfcd678c0
Author: Richard Biener 
Date:   Wed Jan 11 12:07:16 2023 +0100

tree-optimization/108352 - FSM threads creating irreducible loops

The following relaxes a heuristic that prevents creating irreducible
loops from FSM threads not covering multi-way branches.  Instead of
allowing threads that adhere to

  && (n_insns * (unsigned) param_fsm_scale_path_stmts
  > (m_path.length () *
 (unsigned) param_fsm_scale_path_blocks))

with reasoning "We also consider it worth creating an irreducible inner
loop if
the number of copied statement is low relative to the length of the path --
in that case there's little the traditional loop optimizer would have done
anyway, so an irreducible loop is not so bad." that I cannot make much
sense of the following patch changes that to only allow those after
loop optimization and when they are (scaled) short:

  && (!(cfun->curr_properties & PROP_loop_opts_done)
  || (m_n_insns * param_fsm_scale_path_stmts
  >= param_max_jump_thread_duplication_stmts)))

This allows us to get rid of --param fsm-scale-path-blocks which
previous to the bisected revision allowed an enlarged path covering
the original allowance (but we do not consider that enlarged path
now because enlarging it doesn't add any information).

PR tree-optimization/108352
* tree-ssa-threadbackward.cc
(back_threader_profitability::profitable_path_p): Adjust
heuristic that allows non-multi-way branch threads creating
irreducible loops.
* doc/invoke.texi (--param fsm-scale-path-blocks): Remove.
(--param fsm-scale-path-stmts): Adjust.
* params.opt (--param=fsm-scale-path-blocks=): Remove.
(-param=fsm-scale-path-stmts=): Adjust description.

* gcc.dg/tree-ssa/ssa-thread-21.c: New testcase.
* gcc.dg/tree-ssa/vrp46.c: Remove --param fsm-scale-path-blocks=1.

[Bug tree-optimization/108367] [12/13 Regression] ICE: verify_ssa failed (error: definition in block 4 does not dominate use in block 3) since r12-5138-ge82c382971664d6f

2023-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108367

Martin Liška  changed:

   What|Removed |Added

   Last reconfirmed||2023-01-11
Summary|[12/13 Regression] ICE: |[12/13 Regression] ICE:
   |verify_ssa failed (error:   |verify_ssa failed (error:
   |definition in block 4 does  |definition in block 4 does
   |not dominate use in block   |not dominate use in block
   |3)  |3) since
   ||r12-5138-ge82c382971664d6f
   Target Milestone|--- |12.3
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
 CC||aldyh at gcc dot gnu.org,
   ||marxin at gcc dot gnu.org

--- Comment #1 from Martin Liška  ---
Started with r12-5138-ge82c382971664d6f.

[Bug c++/108365] [9/10/11/12/13 Regression] Wrong code with -O0

2023-01-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108365

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #6 from Jakub Jelinek  ---
Created attachment 54243
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54243=edit
gcc13-pr108365.patch

Untested fix.

[Bug tree-optimization/108367] New: [12/13 Regression] ICE: verify_ssa failed (error: definition in block 4 does not dominate use in block 3)

2023-01-11 Thread asolokha at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108367

Bug ID: 108367
   Summary: [12/13 Regression] ICE: verify_ssa failed (error:
definition in block 4 does not dominate use in block
3)
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code, openacc
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---

gcc 13.0.0 20230108 snapshot (g:e3a4bd0bbdccdde0cff85f93064b01a44fb10d2a) ICEs
when compiling libgomp/testsuite/libgomp.oacc-c-c++-common/pr89376.c w/ -Os
-fopenacc -fno-tree-fre -fno-tree-vrp -g:

% gcc-13 -Os -fopenacc -fno-tree-fre -fno-tree-vrp -g -c
libgomp/testsuite/libgomp.oacc-c-c++-common/pr89376.c
libgomp/testsuite/libgomp.oacc-c-c++-common/pr89376.c: In function
'main._omp_fn.0':
libgomp/testsuite/libgomp.oacc-c-c++-common/pr89376.c:14:1: error: definition
in block 4 does not dominate use in block 3
   14 | }
  | ^
for SSA_NAME: rw_7 in statement:
# DEBUG rw => rw_7
during GIMPLE pass: parloops
libgomp/testsuite/libgomp.oacc-c-c++-common/pr89376.c:14:1: internal compiler
error: verify_ssa failed
0x1149446 verify_ssa(bool, bool)
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20230108/work/gcc-13-20230108/gcc/tree-ssa.cc:1211
0xdfea25 execute_function_todo
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20230108/work/gcc-13-20230108/gcc/passes.cc:2098
0xdfee5e execute_todo
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20230108/work/gcc-13-20230108/gcc/passes.cc:2145

[Bug tree-optimization/108137] [12 Regression] ICE: segfault during GIMPLE pass: warn-printf since r12-523-g2254b3233b5bfa69

2023-01-11 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108137

Martin Liška  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #12 from Martin Liška  ---
Fixed.

[Bug tree-optimization/108137] [12 Regression] ICE: segfault during GIMPLE pass: warn-printf since r12-523-g2254b3233b5bfa69

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108137

--- Comment #11 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Martin Liska
:

https://gcc.gnu.org/g:bd4c310b06d747975853ac6dfef6da120c13f6ec

commit r12-9040-gbd4c310b06d747975853ac6dfef6da120c13f6ec
Author: Martin Liska 
Date:   Fri Dec 23 15:27:32 2022 +0100

strlen: do not use cond_expr for boundaries

PR tree-optimization/108137

gcc/ChangeLog:

* tree-ssa-strlen.cc (get_range_strlen_phi): Reject anything
different from INTEGER_CST.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr108137.c: New test.

(cherry picked from commit ee6f262b87fef590729e96e999f1c3b207c251c0)

[Bug tree-optimization/108352] [13 Regression] Dead Code Elimination Regression at -O2 since r13-1960-gd86d81a449c036

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108352

Richard Biener  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #2 from Richard Biener  ---
Checking profitability of path (backwards):  bb:3 (6 insns) bb:9 (0 insns)
(latch) bb:5
  Control statement insns: 2
  Overall: 4 insns
  [4] Registering jump thread: (5, 9) incoming edge;  (9, 3) normal (back) (3,
4) nocopy;
path: 5->9->3->4 SUCCESS

but

Checking profitability of path (backwards):  bb:3 (6 insns) bb:9 (latch)
  Control statement insns: 2
  Overall: 4 insns
  FAIL: Would create irreducible loop without threading multiway branch.
path: 9->3->xx REJECTED

we are no longer considering the first which just adds an unrelated jump
to the path after the patch.  That's the

  /* We avoid creating irreducible inner loops unless we thread through
 a multiway branch, in which case we have deemed it worth losing
 other loop optimizations later.

 We also consider it worth creating an irreducible inner loop if
 the number of copied statement is low relative to the length of
 the path -- in that case there's little the traditional loop
 optimizer would have done anyway, so an irreducible loop is not
 so bad.  */
  if (!threaded_multiway_branch
  && creates_irreducible_loop
  && *creates_irreducible_loop
  && (n_insns * (unsigned) param_fsm_scale_path_stmts
  > (m_path.length () *
 (unsigned) param_fsm_scale_path_blocks)))

{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
 "  FAIL: Would create irreducible loop without threading "
 "multiway branch.\n");
  return false;

heuristic which with 9 -> 3 is 4 * 2 > 2 * 3 but with 5 -> 9 -> 3 we
get 4 * 2 > 3 * 3.

It's also worth noting that neither of the two threads create an irreducible
loop in the end for this particular case since e is also constant on entry
and thus the jump is resolved and the extra loop entry is removed (but
that's out of scope of the threaders analysis here).

It IMHO still makes no sense to reject the shorter path over the longer one
so the above "heuristic" makes absolutely no sense to me.  Raising
--param fsm-scale-path-blocks to 4 "fixes" the testcase on trunk.

The heuristic was added in r6-6600-g2b572b3c213b51 by Jeff in the attempt
to address a coremark regression (PR68398).  I guess Jeff remembers nothing
about this.

Note this is not about adding inner irreducible loops but making loop
itself irreducible.  The length of the path itself also says nothing
about the length of a path through the irreducible loop ...

Reverting the heuristic will reject all non-multi-way branch irreducible
loop creation.  We have another heuristic that rejects threading through
the latch early:

  /* Threading through an empty latch would cause code to be added to
 the latch.  This could alter the loop form sufficiently to cause
 loop optimizations to fail.  Disable these threads until after
 loop optimizations have run.  */
  if ((threaded_through_latch
   || (taken_edge && taken_edge->dest == loop->latch))
  && !(cfun->curr_properties & PROP_loop_opts_done)
  && empty_block_p (loop->latch))

so we could reject irreducible loops before loop opts (w/o just covering
the empty latch case) and otherwise generally allow it even for
non-multi-way branches.

That said, I fear I'm going to replace one bogus heuristic with another ;)

I'm still going to test replacing the heuristic with the following
(which allows to remove the fsm-scale-path-blocks param).

  /* We avoid creating irreducible inner loops unless we thread through
 a multiway branch, in which case we have deemed it worth losing
 other loop optimizations later.

 We also consider it worth creating an irreducible inner loop after
 loop optimizations if the number of copied statement is low.  */
  if (!m_threaded_multiway_branch
  && *creates_irreducible_loop
  && (!(cfun->curr_properties & PROP_loop_opts_done)
  || (m_n_insns * param_fsm_scale_path_stmts
  >= param_max_jump_thread_duplication_stmts)))
{
  if (dump_file && (dump_flags & TDF_DETAILS))
fprintf (dump_file,
 "  FAIL: Would create irreducible loop early without "
 "threading multiway branch.\n");
  /* We compute creates_irreducible_loop only late.  */
  return false; 
}

[Bug tree-optimization/108353] [13 Regression] Dead Code Elimination Regression at -O2 since r13-3898-gaf96500eea72c6

2023-01-11 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108353

--- Comment #6 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:445a48a226ffd530b37bcdc13b6bdca94ba2e122

commit r13-5102-g445a48a226ffd530b37bcdc13b6bdca94ba2e122
Author: Richard Biener 
Date:   Wed Jan 11 09:32:36 2023 +0100

tree-optimization/108353 - copyprop iteration order

After recent improvements to copyprop to catch more constants
it shows that the current iteration order prefering forward
progress over iterating doesn't make much sense for an SSA
propagator.  The following instead first iterates cycles which
makes sure to not start with optimistically constant PHIs out
of cycles that optimistically do not exit.

PR tree-optimization/108353
* tree-ssa-propagate.cc (cfg_blocks_back, ssa_edge_worklist_back):
Remove.
(add_ssa_edge): Simplify.
(add_control_edge): Likewise.
(ssa_prop_init): Likewise.
(ssa_prop_fini): Likewise.
(ssa_propagation_engine::ssa_propagate): Likewise.

* gcc.dg/tree-ssa/ssa-copyprop-3.c: New testcase.

[Bug tree-optimization/108353] [13 Regression] Dead Code Elimination Regression at -O2 since r13-3898-gaf96500eea72c6

2023-01-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108353

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Richard Biener  ---
Fixed.

  1   2   >