[Bug target/80817] [missed optimization][x86] relaxed atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817 Andrew Pinski changed: What|Removed |Added CC||witold.baryluk+gcc at gmail dot co ||m --- Comment #5 from Andrew Pinski --- *** Bug 103966 has been marked as a duplicate of this bug. ***
[Bug target/80817] [missed optimization][x86] relaxed atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817 Andrew Pinski changed: What|Removed |Added Last reconfirmed|2017-05-20 00:00:00 |2021-12-28 Severity|normal |enhancement
[Bug target/80817] [missed optimization][x86] relaxed atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817 --- Comment #4 from Alexander Monakov --- On 32-bit x86 manipulating 64-bit integers, let alone atomically, is going to be inconvenient. The emitted code could have been shorter, instead of movl(%esp), %eax movl4(%esp), %edx addl$1, %eax adcl$0, %edx movl%eax, (%esp) movl%edx, 4(%esp) it would be sufficient to emit addl$1, (%esp) adcl$0, 4(%esp) (it seems stack slots holding the loaded value have been made volatile, wrongly?), and with -msse2 it could have used SSE load/add/store, but that needs enhancements in the STV pass I guess.
[Bug target/80817] [missed optimization][x86] relaxed atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817 --- Comment #3 from Joost VandeVondele --- If I compile with -m32 gcc -std=c++11 -m32 -S -O3 test.cpp I get .cfi_startproc subl$12, %esp .cfi_def_cfa_offset 16 movl16(%esp), %ecx fildq (%ecx) fistpq (%esp) movl(%esp), %eax movl4(%esp), %edx addl$1, %eax adcl$0, %edx movl%eax, (%esp) movl%edx, 4(%esp) fildq (%esp) fistpq (%ecx) addl$12, %esp .cfi_def_cfa_offset 4 ret .cfi_endproc Is the above expected ? This causes a measurable slowdown in the piece of code I'm looking at.
[Bug target/80817] [missed optimization][x86] relaxed atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817 Joost VandeVondele changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2017-05-20 CC||Joost.VandeVondele at mat dot ethz ||.ch Ever confirmed|0 |1
[Bug target/80817] [missed optimization][x86] relaxed atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817 --- Comment #2 from Marc Glisse --- (In reply to Alexander Monakov from comment #1) > void f(volatile int *p) > { > ++*p; > } That's PR 50677 for instance. Some targets do handle it, there have been discussions in the past, this seems to require special care for every instruction of every target that wants to allow the simplification.
[Bug target/80817] [missed optimization][x86] relaxed atomics
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817 Alexander Monakov changed: What|Removed |Added CC||amonakov at gcc dot gnu.org --- Comment #1 from Alexander Monakov --- In the second example it's correct that lock;addq is generated, the read-modify-write operation still needs to be atomic itself, memory_order_relaxed indicates that it does not imply an order with respect to other memory operations. The first example could only be optimized on RTL level (on gimple there's no memory rmw operations), but on RTL atomic accesses are represented as unspecs or volatile accesses (they can't be plain accesses because the compiler may not tear them etc, but there's no special RTL for atomic access, so volatile MEM is the best fit), so on RTL it's similar to how void f(volatile int *p) { ++*p; } is not optimized either (and the issue is visible only on cisc-ish targets with composite memory read-modify-write instructions, otherwise the load and store would be separate anyway).