https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832
--- Comment #21 from Alexander Monakov ---
(In reply to Michael_S from comment #19)
> > Also note that 'vfnmadd231pd 32(%rdx,%rax), %ymm3, %ymm0' would be
> > 'unlaminated' (turned to 2 uops before renaming), so selecting independent
> > IVs for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107905
--- Comment #5 from Alexander Monakov ---
Not sure what you don't like about the inputs, they appear quite reasonable.
Perhaps GCC's estimation of bb frequencies is off (with profile feedback we
achieve good performance).
Georgi: you'll likely
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487
Alexander Monakov changed:
What|Removed |Added
Component|rtl-optimization|tree-optimization
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108491
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108487
Alexander Monakov changed:
What|Removed |Added
Component|tree-optimization |libstdc++
--- Comment #3 from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108519
--- Comment #1 from Alexander Monakov ---
We diverge in sched1 due to extra calls to advance_one_cycle when scheduling a
BB that is empty apart from one debug insn. The following patch adds a hexdump
of automaton state to make the problem
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108519
--- Comment #3 from Alexander Monakov ---
Ah, a worthy sequel to "Note that I wasn't able to figure out a usable email
address for the submitter" from PR 107353. Nevermind then.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108494
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108401
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107621
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107505
Alexander Monakov changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87832
--- Comment #3 from Alexander Monakov ---
Followup patches have been posted at
https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amona...@ispras.ru/
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107505
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
--- Comment #15 from Alexander Monakov ---
Ah, there will be an mfence after the vmovdqa when necessary for an atomic
store, thanks (I missed that because the testcase doesn't scan for mfence).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107647
--- Comment #6 from Alexander Monakov ---
Sure, but I was talking specifically about the pattern matching introduced by
that commit.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315
--- Comment #18 from Alexander Monakov ---
It seems you are saying that as long as GCC emits code according to the Holy
Scripture that is the ABI spec, everything is fine. I imagine on other
architectures maintainers are able to consider how
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187
--- Comment #2 from Alexander Monakov ---
This is caused by overflowing subtraction in autopref_rank_for_schedule:
if (!irrel1 && !irrel2)
/* Sort memory references from lowest offset to the largest. */
r = data1->offset
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109273
Bug ID: 109273
Summary: [11/12/13 Regression] unaligned stp generated with
-mstrict-align
Product: gcc
Version: 13.0
Status: UNCONFIRMED
Severity: normal
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187
--- Comment #3 from Alexander Monakov ---
The reduced case is offsetting stack variables in a manner that seems too
invalid for my taste, so I plan to send a patch with a following testcase
instead (needs -O2 --param
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922
--- Comment #7 from Alexander Monakov ---
I saw that. That's why I'm pointing out that Glibc (and musl) uses the
instruction without any additional checks: real CPUs produce the expected
result in st(0), despite the documentation making no
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922
--- Comment #4 from Alexander Monakov ---
Plus, Glibc does use fprem/fprem1 for fmodl/remainderl on x86_64, as well as
for {fmod,remainder,remquo}{,f,l} on i386 without any branches for corner
cases. So in practice CPUs apparently implement the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922
--- Comment #9 from Alexander Monakov ---
(In reply to Jan Kratochvil from comment #8)
> The revert makes it 13x faster. But the produced code still falls back to
> calling glibc fmod() as shown in the disassembly in Comment 0.
> If I use the
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922
--- Comment #15 from Alexander Monakov ---
That is the fancy-error-handling path that is reached under _LIB_VERSION !=
_IEEE_. Before glibc-2.27, linking with -lieee would set _LIB_VERSION = _IEEE_,
and then glibc would use the fprem[1]
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922
--- Comment #22 from Alexander Monakov ---
Strange, comment #8 claims the opposite (unless Jan tested the revert not on
trunk, but on some branch).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108922
--- Comment #19 from Alexander Monakov ---
I get the feeling that you're ignoring me, but gcc-4.8.3 was already emitting a
helper fmod call for setting errno without any flag_errno_math checks in
i386.md, i.e. it was already in the middle-end.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315
--- Comment #4 from Alexander Monakov ---
Let me address one point separately:
(In reply to Peter Bergner from comment #1)
> CCing Alan, since he probably knows best how this all works, but yes,
> -mcpu-power10 changes the ABI, namely it adds
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315
--- Comment #3 from Alexander Monakov ---
Alan implemented the special case of .localentry 1 in this patch for the BFD
linker (that appeared in binutils 2.32 if my calculations are correct):
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315
--- Comment #10 from Alexander Monakov ---
(In reply to Rui Ueyama from comment #9)
> I'm the maintainer of the mold linker. I didn't implement that POWER10 ABI
> because I didn't have an access to a POWER10 machine and therefore couldn't
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315
Alexander Monakov changed:
What|Removed |Added
Resolution|INVALID |---
Status|RESOLVED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108315
--- Comment #14 from Alexander Monakov ---
Are you guys really sure you want to blame the user here, considering that all
linkers, including the BFD linker, initially misinterpreted the ABI the same
way?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369
--- Comment #9 from Alexander Monakov ---
(In reply to Pali Rohár from comment #8)
> So from the discussion, do I understand correctly that this is rather LD
> linker issue?
Yes, ld changes will be needed to make this work automatically,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109587
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109585
--- Comment #19 from Alexander Monakov ---
Manually minimized testcase for investigation, miscompiled at -O2:
struct P {
long v;
struct P *n;
};
struct F {
long x;
struct P fam[];
};
int f(struct F *f, int i)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109585
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109634
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109368
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369
--- Comment #5 from Alexander Monakov ---
Indeed, sorry, __attribute__((used)) seems a much better solution for symbols
that might be referenced implicitly, in a manner that LTO plugin cannot see.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109369
--- Comment #7 from Alexander Monakov ---
Yes, ld should claim _pei386_runtime_relocator (even if later it becomes
unneeded due to zero relocations left to fix up) to make this work properly.
That's for Binutils to fix on their side.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109469
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109477
Alexander Monakov changed:
What|Removed |Added
Resolution|--- |DUPLICATE
CC|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109469
--- Comment #8 from Alexander Monakov ---
*** Bug 109477 has been marked as a duplicate of this bug. ***
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109187
Alexander Monakov changed:
What|Removed |Added
Resolution|--- |FIXED
Status|ASSIGNED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110250
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110249
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110260
--- Comment #6 from Alexander Monakov ---
(In reply to Jimi Huotari from comment #0)
> (By the by, is ADCX a typo of ADX? I see -madx as an option but only one
> use of it otherwise, and no -adcx as an option and lots of mentions of it...
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110260
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438
--- Comment #3 from Alexander Monakov ---
Patch available:
https://inbox.sourceware.org/gcc-patches/8f73371d732237ed54ede44b7bd88...@ispras.ru/T/#u
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110611
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982
--- Comment #13 from Alexander Monakov ---
No, neither for fields nor for the complete object:
struct
__attribute__((aligned(64)))
S {
int i;
};
void f()
{
struct S s __attribute__((aligned(1))), *p =
int *q =
asm("" ::
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109982
--- Comment #15 from Alexander Monakov ---
For '--float' I think runtime differences are expected when you pass -m flags
that enable FMA, unless you also pass '-ffp-contract=off'.
For '--compiler-attributes' I'd suggest reporting only compiler
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110052
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110054
Alexander Monakov changed:
What|Removed |Added
Keywords||wrong-code
CC|
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110053
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110087
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110089
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109944
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109956
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110169
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109967
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110052
--- Comment #5 from Alexander Monakov ---
There are other reasons why it's invalid. For instance, in a multi-threaded
program it could introduce a data race on assignment to foo->size inside of
'myrealloc' where the original program might have
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110069
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110035
--- Comment #15 from Alexander Monakov ---
malloc and friends modify 'errno' on failure, so in they would have to be
special-cased for alias analysis.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110007
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109950
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110762
--- Comment #14 from Alexander Monakov ---
That seems undesirable in light of comment #4, you'd risk creating a situation
when -fno-trapping-math is unpredictably slower when denormals appear in dirty
upper halves.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799
--- Comment #9 from Alexander Monakov ---
(In reply to Tom de Vries from comment #7)
> Can you elaborate on what you consider a correct approach?
I think this optimization is incorrect and should be active only under -Ofast.
I can offer two
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110823
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110799
--- Comment #16 from Alexander Monakov ---
In C11 and C++11 the issue of compiler-introduced racing loads is discussed as
follows (5.1.2.4 Multi-threaded executions and data races in C11):
28 NOTE 14 Transformations that introduce a
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
--- Comment #11 from Alexander Monakov ---
(In reply to Alexander Monakov from comment #8)
> inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
> {
> memcpy(p, , sizeof(x));
> }
>
>
> We deciding to not inline this, while
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
--- Comment #10 from Alexander Monakov ---
Ah, the non-static inlines are intentional, the corresponding extern
declarations appear in library/platform_util.c. Sorry, I missed that file the
first time around.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110979
--- Comment #2 from Alexander Monakov ---
Yes, it is wrong-code to full extent. To demonstrate, you can initialize 'sum'
and the array to negative zeroes:
#define FLT double
#define N 20
__attribute__((noipa))
FLT
foo3 (FLT *a)
{
FLT sum
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202
Alexander Monakov changed:
What|Removed |Added
Status|NEW |RESOLVED
Resolution|---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110926
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
--- Comment #8 from Alexander Monakov ---
Why? There's no bswap here, in particular mbedtls_put_unaligned_uint64 is a
straightforward wrapper for memcpy:
inline void mbedtls_put_unaligned_uint64(void *p, uint64_t x)
{
memcpy(p, ,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110946
--- Comment #9 from Alexander Monakov ---
(In reply to Alexander Monakov from comment #2)
> Note that inline functions in mbedtls/library/alignment.h all miss the
> 'static' qualifier, which affects inlining decisions, and looks like a
>
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273
--- Comment #3 from Alexander Monakov ---
Seems to work fine with explicit '-mincoming-stack-boundary=2' on the command
line, even though it should make no difference for the 32-bit MinGW target.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273
--- Comment #4 from Alexander Monakov ---
Further reduced:
void f()
{
int c[4] = { 0, 0, 0, 0 };
int cc[8] = { 0 };
asm("" :: "m"(c), "m"(cc));
}
Also reproducible with -march=skylake-avx512 or even plain -mavx512f,
retitling.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307
--- Comment #8 from Alexander Monakov ---
REG_EH_REGION is handled further down that function, but
copy_reg_eh_region_note_backward does not copy the note. Perhaps it needs
diff --git a/gcc/except.cc b/gcc/except.cc
index
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307
--- Comment #10 from Alexander Monakov ---
I think the first patch may result in duplicated notes, so I wouldn't recommend
picking it.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110369
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110273
--- Comment #8 from Alexander Monakov ---
(In reply to Sam James from comment #7)
> We keep getting quite a few reports of this downstream.
Of this mingw32 stack realignment issue specifically, i.e. Wine breakage when
AVX512 is enabled via
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #11 from Alexander Monakov ---
The trapping angle seems valid, but I have a really hard time understanding the
DSE issue, and the preceding issue about disambiguation based on RTL aliasing.
How would DSE optimize out 'd[5] = 1' in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #13 from Alexander Monakov ---
(In reply to rguent...@suse.de from comment #12)
> As explained in comment#3 the issue is related to the tree alias oracle
> part that gets invoked on the MEM_EXPR for the load where there is
> no
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #16 from Alexander Monakov ---
(In reply to rguent...@suse.de from comment #14)
> vectors of T and scalar T interoperate TBAA wise. What we disambiguate is
>
> int a[2];
>
> int foo(int *p)
> {
> a[0] = 1;
> *(v4si *)p =
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307
Alexander Monakov changed:
What|Removed |Added
CC||amonakov at gcc dot gnu.org
---
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110307
--- Comment #5 from Alexander Monakov ---
It's not necessary yet for this particular bug, but might be helpful for future
bugs (if disk space is not an issue).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237
--- Comment #18 from Alexander Monakov ---
(In reply to rguent...@suse.de from comment #17)
> Yes, we do the same to loads. I hope that's not a common technique
> though but I have to admit the vectorizer itself assesses whether it's
> safe to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110202
--- Comment #7 from Alexander Monakov ---
Note that vpxor serves as a dependency-breaking instruction (see PR 110438). So
in negate1 we do the right thing for the wrong reasons, and in negate2 we can
cause a substantial stall if the previous
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110438
--- Comment #1 from Alexander Monakov ---
We might want to omit PXOR when optimizing for size.
201 - 300 of 379 matches
Mail list logo