https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70823
Bug ID: 70823
Summary: x86_64: __atomic_fetch_and/or/xor() should perhaps use
BTR/BTS/BTC if they can
Product: gcc
Version: 6.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: dhowells at redhat dot com
Target Milestone: ---
Created attachment 38347
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38347&action=edit
Test source
If given a mask that clears, sets or flips a single bit and the result is
checked for just that bit and reduced to bool, then the __atomic_fetch_and, _or
and _xor functions should consider using BTR, BTS or BTC as appropriate.
So, something like:
static __always_inline bool test_and_set_bit(unsigned bit, unsigned long
*ptr)
{
unsigned long mask = 1UL << (bit & (BITS_PER_LONG - 1));
unsigned long old;
ptr += bit / BITS_PER_LONG;
old = __atomic_fetch_or(ptr, mask, __ATOMIC_SEQ_CST);
return old & mask;
}
where the mask is constructed by 1UL << bitnr. As things stand, for the
example above, the result ends up with a CMPXCHG loop rather a BTS instruction:
b: 89 f9 mov %edi,%ecx
d: ba 01 00 00 00 mov $0x1,%edx
12: c1 ef 06 shr $0x6,%edi
15: 48 d3 e2 shl %cl,%rdx
18: 89 f9 mov %edi,%ecx
1a: 48 8b 04 ce mov (%rsi,%rcx,8),%rax
1e: 49 89 c0 mov %rax,%r8
21: 48 89 c7 mov %rax,%rdi
24: 49 09 d0 or %rdx,%r8
27: f0 4c 0f b1 04 ce lock cmpxchg %r8,(%rsi,%rcx,8)
2d: 75 ef jne 1e <set_bit+0x13>
2f: 48 85 fa test %rdi,%rdx
32: 0f 95 c0 setne %al
35: c3 retq
Could we instead get something like:
bts %edi,(%rsi)
setne %al
retq
See the attached test source which should be compiled to a .s file.
This is the case for all of:
gcc version 5.3.1 20151207 (Red Hat 5.3.1-2) (GCC)
gcc version 6.0.0 20160219 (Red Hat Cross 6.0.0-0.1) (GCC)
gcc version 4.8.5 20150623 (Red Hat 4.8.5-2.x) (GCC)