[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3

2022-11-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832 --- Comment #21 from Alexander Monakov --- (In reply to Michael_S from comment #19) > > Also note that 'vfnmadd231pd 32(%rdx,%rax), %ymm3, %ymm0' would be > > 'unlaminated' (turned to 2 uops before renaming), so selecting independent > > IVs for

[Bug middle-end/107905] 2x slowdown versus CLANG and ICL

2022-11-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905 --- Comment #5 from Alexander Monakov --- Not sure what you don't like about the inputs, they appear quite reasonable. Perhaps GCC's estimation of bb frequencies is off (with profile feedback we achieve good performance). Georgi: you'll likely

[Bug tree-optimization/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view

2023-01-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487 Alexander Monakov changed: What|Removed |Added Component|rtl-optimization|tree-optimization

[Bug target/108491] cross compiler does not work: cc1: error: ‘-msecure-plt’ not supported by your assembler

2023-01-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108491 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug libstdc++/108487] [10/11/12/13 Regression] ~20-30x slowdown in populating std::vector from std::ranges::iota_view

2023-01-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487 Alexander Monakov changed: What|Removed |Added Component|tree-optimization |libstdc++ --- Comment #3 from

[Bug rtl-optimization/108519] [13 regression] gcc.target/powerpc/pr105586.c fails after r13-5154-g733a1b777f16cd

2023-01-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108519 --- Comment #1 from Alexander Monakov --- We diverge in sched1 due to extra calls to advance_one_cycle when scheduling a BB that is empty apart from one debug insn. The following patch adds a hexdump of automaton state to make the problem

[Bug rtl-optimization/108519] [13 regression] gcc.target/powerpc/pr105586.c fails after r13-5154-g733a1b777f16cd

2023-01-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108519 --- Comment #3 from Alexander Monakov --- Ah, a worthy sequel to "Note that I wasn't able to figure out a usable email address for the submitter" from PR 107353. Nevermind then.

[Bug libgomp/108494] Slow thread creation with nested loops in GFortran

2023-01-23 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108494 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/108401] gcc defeats vector constant generation with intrinsics

2023-01-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug other/107621] spinx generated documents has too much white space on the top

2022-11-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107621 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug tree-optimization/107505] [13 Regression] ICE: verify_flow_info failed (error: returns_twice call is not first in basic block 2)

2022-11-07 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107505 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/87832] AMD pipeline models are very costly size-wise

2022-11-07 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832 --- Comment #3 from Alexander Monakov --- Followup patches have been posted at https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amona...@ispras.ru/

[Bug tree-optimization/107505] [13 Regression] ICE: verify_flow_info failed (error: returns_twice call is not first in basic block 2)

2022-11-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107505 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 --- Comment #15 from Alexander Monakov --- Ah, there will be an mfence after the vmovdqa when necessary for an atomic store, thanks (I missed that because the testcase doesn't scan for mfence).

[Bug target/104688] gcc and libatomic can use SSE for 128-bit atomic loads on Intel and AMD CPUs with AVX

2022-11-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug tree-optimization/107647] [12/13 Regression] GCC 12.2.0 may produce FMAs even with -ffp-contract=off

2022-11-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug tree-optimization/107647] [12/13 Regression] GCC 12.2.0 may produce FMAs even with -ffp-contract=off

2022-11-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647 --- Comment #6 from Alexander Monakov --- Sure, but I was talking specifically about the pattern matching introduced by that commit.

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #18 from Alexander Monakov --- It seems you are saying that as long as GCC emits code according to the Holy Scripture that is the ABI spec, everything is fine. I imagine on other architectures maintainers are able to consider how

[Bug rtl-optimization/109187] [13 Regression] ICE: qsort checking failed: qsort comparator non-negative on sorted output: 1736258160 at -O2 since r13-5154-g733a1b777f16cd

2023-03-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187 --- Comment #2 from Alexander Monakov --- This is caused by overflowing subtraction in autopref_rank_for_schedule: if (!irrel1 && !irrel2) /* Sort memory references from lowest offset to the largest. */ r = data1->offset

[Bug target/109273] New: [11/12/13 Regression] unaligned stp generated with -mstrict-align

2023-03-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109273 Bug ID: 109273 Summary: [11/12/13 Regression] unaligned stp generated with -mstrict-align Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal

[Bug rtl-optimization/109187] [13 Regression] ICE: qsort checking failed: qsort comparator non-negative on sorted output: 1736258160 at -O2 since r13-5154-g733a1b777f16cd

2023-03-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187 --- Comment #3 from Alexander Monakov --- The reduced case is offsetting stack variables in a manner that seems too invalid for my taste, so I plan to send a patch with a following testcase instead (needs -O2 --param

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #7 from Alexander Monakov --- I saw that. That's why I'm pointing out that Glibc (and musl) uses the instruction without any additional checks: real CPUs produce the expected result in st(0), despite the documentation making no

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #4 from Alexander Monakov --- Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64, as well as for {fmod,remainder,remquo}{,f,l} on i386 without any branches for corner cases. So in practice CPUs apparently implement the

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #9 from Alexander Monakov --- (In reply to Jan Kratochvil from comment #8) > The revert makes it 13x faster. But the produced code still falls back to > calling glibc fmod() as shown in the disassembly in Comment 0. > If I use the

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #15 from Alexander Monakov --- That is the fancy-error-handling path that is reached under _LIB_VERSION != _IEEE_. Before glibc-2.27, linking with -lieee would set _LIB_VERSION = _IEEE_, and then glibc would use the fprem[1]

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #22 from Alexander Monakov --- Strange, comment #8 claims the opposite (unless Jan tested the revert not on trunk, but on some branch).

[Bug target/108922] fmod() 13x slowdown in gcc4.9 dropping "fprem" and calling fmod()

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922 --- Comment #19 from Alexander Monakov --- I get the feeling that you're ignoring me, but gcc-4.8.3 was already emitting a helper fmod call for setting errno without any flag_errno_math checks in i386.md, i.e. it was already in the middle-end.

[Bug target/108315] -mcpu=power10 changes ABI

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #4 from Alexander Monakov --- Let me address one point separately: (In reply to Peter Bergner from comment #1) > CCing Alan, since he probably knows best how this all works, but yes, > -mcpu-power10 changes the ABI, namely it adds

[Bug target/108315] -mcpu=power10 changes ABI

2023-02-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #3 from Alexander Monakov --- Alan implemented the special case of .localentry 1 in this patch for the BFD linker (that appeared in binutils 2.32 if my calculations are correct):

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #10 from Alexander Monakov --- (In reply to Rui Ueyama from comment #9) > I'm the maintainer of the mold linker. I didn't implement that POWER10 ABI > because I didn't have an access to a POWER10 machine and therefore couldn't >

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 Alexander Monakov changed: What|Removed |Added Resolution|INVALID |--- Status|RESOLVED

[Bug target/108315] -mcpu=power10 changes ABI

2023-03-03 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315 --- Comment #14 from Alexander Monakov --- Are you guys really sure you want to blame the user here, considering that all linkers, including the BFD linker, initially misinterpreted the ABI the same way?

[Bug lto/109369] LTO drops explicitly referenced symbol _pei386_runtime_relocator

2023-04-13 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369 --- Comment #9 from Alexander Monakov --- (In reply to Pali Rohár from comment #8) > So from the discussion, do I understand correctly that this is rather LD > linker issue? Yes, ld changes will be needed to make this work automatically,

[Bug tree-optimization/109587] Deeply nested loop unrolling overwhelms register allocator with -O3

2023-04-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug rtl-optimization/109585] [10/11/12/13/14 regression] Carla/sord miscompiled with -O2 on ARM64 with flexible array member

2023-04-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109585 --- Comment #19 from Alexander Monakov --- Manually minimized testcase for investigation, miscompiled at -O2: struct P { long v; struct P *n; }; struct F { long x; struct P fam[]; }; int f(struct F *f, int i)

[Bug rtl-optimization/109585] Carla/sord miscompiled with -O2 on ARM64 with flexible array member

2023-04-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109585 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug libgomp/109634] Linking Imagick for PHP compiles fine but gives segfault caused by libgomp on runtime

2023-04-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109634 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug lto/109368] LTO drops entry point symbol

2023-04-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109368 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug lto/109369] LTO drops explicitly referenced symbol _pei386_runtime_relocator

2023-04-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug lto/109369] LTO drops explicitly referenced symbol _pei386_runtime_relocator

2023-04-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369 --- Comment #5 from Alexander Monakov --- Indeed, sorry, __attribute__((used)) seems a much better solution for symbols that might be referenced implicitly, in a manner that LTO plugin cannot see.

[Bug lto/109369] LTO drops explicitly referenced symbol _pei386_runtime_relocator

2023-04-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369 --- Comment #7 from Alexander Monakov --- Yes, ld should claim _pei386_runtime_relocator (even if later it becomes unneeded due to zero relocations left to fix up) to make this work properly. That's for Binutils to fix on their side.

[Bug tree-optimization/109469] [13 regression] ICE: internal compiler error: verify_flow_info failed (error: returns_twice call is not first in basic block 2)

2023-04-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109469 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug tree-optimization/109477] [13 regression] ICE: internal compiler error: verify_flow_info failed (error: returns_twice call is not first in basic block 8) when building busybox

2023-04-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109477 Alexander Monakov changed: What|Removed |Added Resolution|--- |DUPLICATE CC|

[Bug tree-optimization/109469] [13 regression] ICE: internal compiler error: verify_flow_info failed (error: returns_twice call is not first in basic block 2) when building xdvik

2023-04-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109469 --- Comment #8 from Alexander Monakov --- *** Bug 109477 has been marked as a duplicate of this bug. ***

[Bug bootstrap/84402] [meta] GCC build system: parallelism bottleneck

2023-03-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug rtl-optimization/109187] [13 Regression] ICE: qsort checking failed: qsort comparator non-negative on sorted output: 1736258160 at -O2 since r13-5154-g733a1b777f16cd

2023-03-28 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187 Alexander Monakov changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug web/110250] Broken url to README in st/cli-be project

2023-06-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110250 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug c/110249] __builtin_unreachable helps optimisation at -O1 but not at -O2

2023-06-14 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug rtl-optimization/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-06-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110260] Multiple applications misbehave when compiled with -march=znver4

2023-06-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110260 --- Comment #6 from Alexander Monakov --- (In reply to Jimi Huotari from comment #0) > (By the by, is ADCX a typo of ADX? I see -madx as an option but only one > use of it otherwise, and no -adcx as an option and lots of mentions of it... >

[Bug target/110260] Multiple applications misbehave when compiled with -march=znver4

2023-06-15 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110260 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110438] generating all-ones zmm needs dep-breaking pxor before ternlog

2023-07-04 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438 --- Comment #3 from Alexander Monakov --- Patch available: https://inbox.sourceware.org/gcc-patches/8f73371d732237ed54ede44b7bd88...@ispras.ru/T/#u

[Bug target/110611] X86 is not honouring POINTERS_EXTEND_UNSIGNED in m32 code.

2023-07-10 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110611 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/109982] csmith: x86_64: znver1 issues

2023-05-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/109982] csmith: x86_64: znver1 issues

2023-05-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982 --- Comment #13 from Alexander Monakov --- No, neither for fields nor for the complete object: struct __attribute__((aligned(64))) S { int i; }; void f() { struct S s __attribute__((aligned(1))), *p = int *q = asm("" ::

[Bug target/109982] csmith: x86_64: znver1 issues

2023-05-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982 --- Comment #15 from Alexander Monakov --- For '--float' I think runtime differences are expected when you pass -m flags that enable FMA, unless you also pass '-ffp-contract=off'. For '--compiler-attributes' I'd suggest reporting only compiler

[Bug middle-end/110052] useless local variable not optimized away

2023-05-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110052 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug libstdc++/110054] stdx::simd masked store should not use non-temporal store instruction

2023-05-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110054 Alexander Monakov changed: What|Removed |Added Keywords||wrong-code CC|

[Bug c/110053] csmith: problems with -O1 and -O2 in same file

2023-05-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110053 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug tree-optimization/110087] Missing if conversion

2023-06-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110087 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/110089] sub-optimal code for attempting to produce JNA (jump on CF or ZF)

2023-06-02 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110089 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/109944] vector CTOR with byte elements and SSE2 has STLF fail

2023-05-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109944 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug c/109956] GCC reserves 9 bytes for struct s { int a; char b; char t[]; } x = {1, 2, 3};

2023-05-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug c/110169] wrong code with '-Ofast'

2023-06-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110169 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/109967] [10/11/12/13/14 Regression] Wrong code at -O2 on x86_64-linux-gnu

2023-06-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109967 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug middle-end/110052] useless local variable not optimized away

2023-06-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110052 --- Comment #5 from Alexander Monakov --- There are other reasons why it's invalid. For instance, in a multi-threaded program it could introduce a data race on assignment to foo->size inside of 'myrealloc' where the original program might have

[Bug middle-end/110069] [Perf] -finstrument-functions causes program size to double

2023-06-01 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110069 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug tree-optimization/110035] Missed optimization for dependent assignment statements

2023-06-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035 --- Comment #15 from Alexander Monakov --- malloc and friends modify 'errno' on failure, so in they would have to be special-cased for alias analysis.

[Bug c/110007] Implement support for Clang’s __builtin_unpredictable()

2023-05-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug tree-optimization/109950] can array subscripts be assumed to be non-negative?

2023-05-24 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109950 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110762] inappropriate use of SSE (or AVX) insns for v2sf mode operations

2023-07-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762 --- Comment #14 from Alexander Monakov --- That seems undesirable in light of comment #4, you'd risk creating a situation when -fno-trapping-math is unpredictably slower when denormals appear in dirty upper halves.

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-25 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #9 from Alexander Monakov --- (In reply to Tom de Vries from comment #7) > Can you elaborate on what you consider a correct approach? I think this optimization is incorrect and should be active only under -Ofast. I can offer two

[Bug rtl-optimization/110823] [missed optimization] >50% speedup for x86-64 ASCII processing a la GNU diffutils

2023-07-30 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug sanitizer/110799] [tsan] False positive due to -fhoist-adjacent-loads

2023-07-31 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799 --- Comment #16 from Alexander Monakov --- In C11 and C++11 the issue of compiler-introduced racing loads is discussed as follows (5.1.2.4 Multi-threaded executions and data races in C11): 28 NOTE 14 Transformations that introduce a

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #11 from Alexander Monakov --- (In reply to Alexander Monakov from comment #8) > inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) > { > memcpy(p, , sizeof(x)); > } > > > We deciding to not inline this, while

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #10 from Alexander Monakov --- Ah, the non-static inlines are intentional, the corresponding extern declarations appear in library/platform_util.c. Sorry, I missed that file the first time around.

[Bug target/110979] Miss-optimization for O2 fully masked loop on floating point reduction.

2023-08-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979 --- Comment #2 from Alexander Monakov --- Yes, it is wrong-code to full extent. To demonstrate, you can initialize 'sum' and the array to negative zeroes: #define FLT double #define N 20 __attribute__((noipa)) FLT foo3 (FLT *a) { FLT sum

[Bug target/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-08-05 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 Alexander Monakov changed: What|Removed |Added Status|NEW |RESOLVED Resolution|---

[Bug target/110926] [14 regression] Bootstrap failure (matmul_i1.c:1781:1: internal compiler error: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in vpternlog_redundant_operand_m

2023-08-06 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug other/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #8 from Alexander Monakov --- Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a straightforward wrapper for memcpy: inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x) { memcpy(p, ,

[Bug ipa/110946] 3x perf regression with -Os on M1 Pro

2023-08-08 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946 --- Comment #9 from Alexander Monakov --- (In reply to Alexander Monakov from comment #2) > Note that inline functions in mbedtls/library/alignment.h all miss the > 'static' qualifier, which affects inlining decisions, and looks like a >

[Bug target/110273] i686-w64-mingw32 with -march=znver4 generates AVX instructions without stack alignment

2023-06-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273 --- Comment #3 from Alexander Monakov --- Seems to work fine with explicit '-mincoming-stack-boundary=2' on the command line, even though it should make no difference for the 32-bit MinGW target.

[Bug target/110273] i686-w64-mingw32 with -march=znver4 generates AVX instructions without stack alignment

2023-06-16 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273 --- Comment #4 from Alexander Monakov --- Further reduced: void f() { int c[4] = { 0, 0, 0, 0 }; int cc[8] = { 0 }; asm("" :: "m"(c), "m"(cc)); } Also reproducible with -march=skylake-avx512 or even plain -mavx512f, retitling.

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-21 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #8 from Alexander Monakov --- REG_EH_REGION is handled further down that function, but copy_reg_eh_region_note_backward does not copy the note. Perhaps it needs diff --git a/gcc/except.cc b/gcc/except.cc index

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #10 from Alexander Monakov --- I think the first patch may result in duplicated notes, so I wouldn't recommend picking it.

[Bug tree-optimization/110369] wrong code on x86_64-linux-gnu

2023-06-22 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110369 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug target/110273] [12/13/14 Regression] i686-w64-mingw32 with -mavx512f generates AVX instructions without stack alignment

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273 --- Comment #8 from Alexander Monakov --- (In reply to Sam James from comment #7) > We keep getting quite a few reports of this downstream. Of this mingw32 stack realignment issue specifically, i.e. Wine breakage when AVX512 is enabled via

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #11 from Alexander Monakov --- The trapping angle seems valid, but I have a really hard time understanding the DSE issue, and the preceding issue about disambiguation based on RTL aliasing. How would DSE optimize out 'd[5] = 1' in

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #13 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #12) > As explained in comment#3 the issue is related to the tree alias oracle > part that gets invoked on the MEM_EXPR for the load where there is > no

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #16 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #14) > vectors of T and scalar T interoperate TBAA wise. What we disambiguate is > > int a[2]; > > int foo(int *p) > { > a[0] = 1; > *(v4si *)p =

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 on alpha with -fPIC -fpeephole2 -fschedule-insns2

2023-06-19 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org ---

[Bug rtl-optimization/110307] ICE in move_insn, at haifa-sched.cc:5473 when building Ruby on alpha with -fPIC -O2 (or -fpeephole2 -fschedule-insns2)

2023-06-20 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307 --- Comment #5 from Alexander Monakov --- It's not necessary yet for this particular bug, but might be helpful for future bugs (if disk space is not an issue).

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-26 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #18 from Alexander Monakov --- (In reply to rguent...@suse.de from comment #17) > Yes, we do the same to loads. I hope that's not a common technique > though but I have to admit the vectorizer itself assesses whether it's > safe to

[Bug rtl-optimization/110202] _mm512_ternarylogic_epi64 generates unnecessary operations

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202 --- Comment #7 from Alexander Monakov --- Note that vpxor serves as a dependency-breaking instruction (see PR 110438). So in negate1 we do the right thing for the wrong reasons, and in negate2 we can cause a substantial stall if the previous

[Bug target/110438] generating all-ones zmm needs dep-breaking pxor before ternlog

2023-06-27 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438 --- Comment #1 from Alexander Monakov --- We might want to omit PXOR when optimizing for size.

<    1   2   3   4   >