[Bug target/80817] [missed optimization][x86] relaxed atomics

2022-01-10 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

Andrew Pinski  changed:

   What|Removed |Added

 CC||witold.baryluk+gcc at gmail 
dot co
   ||m

--- Comment #5 from Andrew Pinski  ---
*** Bug 103966 has been marked as a duplicate of this bug. ***

[Bug target/80817] [missed optimization][x86] relaxed atomics

2021-12-28 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed|2017-05-20 00:00:00 |2021-12-28
   Severity|normal  |enhancement

[Bug target/80817] [missed optimization][x86] relaxed atomics

2017-05-22 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

--- Comment #4 from Alexander Monakov  ---
On 32-bit x86 manipulating 64-bit integers, let alone atomically, is going to
be inconvenient. The emitted code could have been shorter, instead of


movl(%esp), %eax
movl4(%esp), %edx
addl$1, %eax
adcl$0, %edx
movl%eax, (%esp)
movl%edx, 4(%esp)

it would be sufficient to emit

addl$1, (%esp)
adcl$0, 4(%esp)

(it seems stack slots holding the loaded value have been made volatile,
wrongly?), and with -msse2 it could have used SSE load/add/store, but that
needs enhancements in the STV pass I guess.

[Bug target/80817] [missed optimization][x86] relaxed atomics

2017-05-22 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

--- Comment #3 from Joost VandeVondele  
---
If I compile with -m32

gcc -std=c++11 -m32 -S -O3  test.cpp

I get 

.cfi_startproc
subl$12, %esp
.cfi_def_cfa_offset 16
movl16(%esp), %ecx
fildq   (%ecx)
fistpq  (%esp)
movl(%esp), %eax
movl4(%esp), %edx
addl$1, %eax
adcl$0, %edx
movl%eax, (%esp)
movl%edx, 4(%esp)
fildq   (%esp)
fistpq  (%ecx)
addl$12, %esp
.cfi_def_cfa_offset 4
ret
.cfi_endproc


Is the above expected ? This causes a measurable slowdown in the piece of code
I'm looking at.

[Bug target/80817] [missed optimization][x86] relaxed atomics

2017-05-20 Thread Joost.VandeVondele at mat dot ethz.ch
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

Joost VandeVondele  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2017-05-20
 CC||Joost.VandeVondele at mat dot 
ethz
   ||.ch
 Ever confirmed|0   |1

[Bug target/80817] [missed optimization][x86] relaxed atomics

2017-05-18 Thread glisse at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

--- Comment #2 from Marc Glisse  ---
(In reply to Alexander Monakov from comment #1)
> void f(volatile int *p)
> {
>   ++*p;
> }


That's PR 50677 for instance. Some targets do handle it, there have been
discussions in the past, this seems to require special care for every
instruction of every target that wants to allow the simplification.

[Bug target/80817] [missed optimization][x86] relaxed atomics

2017-05-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80817

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov  ---
In the second example it's correct that lock;addq is generated, the
read-modify-write operation still needs to be atomic itself,
memory_order_relaxed indicates that it does not imply an order with respect to
other memory operations.

The first example could only be optimized on RTL level (on gimple there's no
memory rmw operations), but on RTL atomic accesses are represented as unspecs
or volatile accesses (they can't be plain accesses because the compiler may not
tear them etc, but there's no special RTL for atomic access, so volatile MEM is
the best fit), so on RTL it's similar to how

void f(volatile int *p)
{
  ++*p;
}

is not optimized either (and the issue is visible only on cisc-ish targets with
composite memory read-modify-write instructions, otherwise the load and store
would be separate anyway).