On Fri, 16 Feb 2024 at 12:38, Jonathan Wakely <jwak...@redhat.com> wrote: > > On Fri, 2 Feb 2024 at 16:52, xndcn <xnd...@gmail.com> wrote: > > > > Thank you for your careful review! > > > > > But we don't need a new one if it's going to be used in exactly one test > > > and if the new option does the same thing for all targets that run the > > > test. > > Got it, thanks. Now add option "-latomic" directly, but it still rely > > on the trick "[atomic_link_flags [get_multilibs]]" > > > > > No, because the patch is supposed to prevent the infinite loop, and so > > > there's no need to stop it looping after 10s. It won't loop at all. > > Thanks, deleted. > > > > > We only need to clear padding for long double, not float and double, > > > right? > > Yes, actually there is a check "if constexpr > > (__atomic_impl::__maybe_has_padding<_Fp>())". > > But "__atomic_impl::__clear_padding(_M_fp); " is indeed simply, so fixed > > here. > > > > > Why can't we run this on all targets? > > Got it, now target option deleted. > > > > > There's no reason to use __builtin_memset here, just include <cstring> > > > and use std::memcpy. > > Thanks, fixed. > > > > > It definitely does have padding, just say "long double has padding bits > > > on x86" > > Thanks, fixed. > > > > So here comes the latest patch: > > > Thanks. I've applied the patch to my tree, but the new test fails > pretty reliably. > > The infinite loop in std::atomic<long double>::fetch_add is fixed by > clearing padding in the constructor, but the test fails on the > compare_exchange_weak or compare_exchange_strong lines here: > > > > + as.fetch_add(t); > > + long double s = f + t; > > + t = as.load(); > > + VERIFY(s == t); // padding ignored on float comparing > > + fill_padding(s); > > + VERIFY(as.compare_exchange_weak(s, f)); // padding cleared on cmpexchg > > + fill_padding(f); > > + VERIFY(as.compare_exchange_strong(f, t)); // padding cleared on > > cmpexchg > > > > > I think the problem is here in __atomic_impl::__compare_exchange: > > if (__atomic_compare_exchange(__pval, __pexp, __pi, > __is_weak, int(__s), int(__f))) > return true; > > Even though padding in *__pexp and *__pi has been cleared, the value > of *__pval after a successful __atomic_compare_exchange has non-zero > padding. That means that the next compare_exchange will fail, because > we assume that the stored value always has zeroed padding bits. > > Here's a gdb session showing that __atomic_compare_exchange stores a > value with non-zero padding: > > Breakpoint 2, test01 () at compare_exchange_padding.cc:43 > 43 long double s2 = s; > (gdb) n > 44 fill_padding(s2); > (gdb) > 45 while (!as.compare_exchange_weak(s2, f)) // padding cleared > on compexchg > (gdb) p/x as._M_fp > $11 = 0x40008000000000000000 > (gdb) step > std::__atomic_float<long double>::compare_exchange_weak > (this=0x7fffffffd8c0, __expected=@0x7fffffffd8a0: 2, __desired=0.5, > __order=std::memory_order::seq_cst) at > /home/jwakely/gcc/14/include/c++/14.0.1/bits/atomic_base.h:1387 > 1387 return compare_exchange_weak(__expected, __desired, __order, > (gdb) step > std::__atomic_float<long double>::compare_exchange_weak > (this=0x7fffffffd8c0, __expected=@0x7fffffffd8a0: 2, __desired=0.5, > __success=std::memory_order::seq_cst, > __failure=std::memory_order::seq_cst) at > /home/jwakely/gcc/14/include/c++/14.0.1/bits/atomic_base.h:1347 > 1347 return __atomic_impl::compare_exchange_weak(&_M_fp, > (gdb) step > std::__atomic_impl::compare_exchange_weak<false, long double> > (__check_padding=false, __failure=std::memory_order::seq_cst, > __success=std::memory_order::seq_cst, __desired=0.5, > __expected=@0x7fffffffd8a0: 2, __ptr=0x7fffffffd8c0) > at /home/jwakely/gcc/14/include/c++/14.0.1/bits/atomic_base.h:1123 > 1123 return __atomic_impl::__compare_exchange<_AtomicRef>( > (gdb) > std::__atomic_impl::__compare_exchange<false, long double> > (__f=std::memory_order::seq_cst, __s=std::memory_order::seq_cst, > __is_weak=true, > __i=<optimized out>, __e=@0x7fffffffd8a0: 2, > __val=@0x7fffffffd8c0: 2) at > /home/jwakely/gcc/14/include/c++/14.0.1/bits/atomic_base.h:994 > 994 __glibcxx_assert(__is_valid_cmpexch_failure_order(__f)); > (gdb) n > 997 _Tp* const __pval = std::__addressof(__val); > (gdb) > 1008 _Vp* const __pi = __atomic_impl::__clear_padding(__i); > (gdb) > 1010 _Vp __exp = __e; > (gdb) > 1012 _Vp* const __pexp = __atomic_impl::__clear_padding(__exp); > (gdb) > 1016 if (__atomic_compare_exchange(__pval, __pexp, __pi, > (gdb) p/x *__pval > $12 = 0x40008000000000000000 > (gdb) p/x *__pexp > $13 = 0x40008000000000000000 > (gdb) p/x *__pi > $14 = 0x3ffe8000000000000000 > (gdb) n > 1018 return true; > (gdb) p/x *__pval > $15 = 0x7ffff7bf3ffe8000000000000000 > (gdb) > > We stored *__pi which has zero padding, but the result in *__pval has > non-zero padding. This doesn't seem to be gdb being misleading by > loading *__pval into a FP register which doesn't preserve the zero > padding, because if I do this then it fails: > > as.fetch_add(t); > VERIFY(as.load() == s); > __builtin_clear_padding(&s); > VERIFY( std::memcmp(&s, &as, sizeof(s)) == 0 ); > > So the value stored by fetch_add (which uses compare_exchange_weak in > a loop) really doesn't have zero padding bits. > > I'm not sure what's going on here yet, maybe __atomic_compare_exchange > recognizes that the pointers are long double* and so only copies the > non-padding bits into the value?
Ah, although __atomic_compare_exchange only takes pointers, the compiler replaces that with a call to __atomic_compare_exchange_n which takes the newval by value, which presumably uses an 80-bit FP register and so the padding bits become indeterminate again. Maybe we can make std::atomic<long double> actually store a 16-byte struct containing an opaque char buffer, which we store a long double into. That should stop FP registers being used for it. We never need an actual long double, because we only access the value using the __atomic_xxx built-ins. Maybe we need that for every std::atomic<T> where T has padding bits, because even for struct S { int i; char c; } passing S by value could be scalarized and only copy the value representation, not preserve the padding that we've carefully cleared. > > I think I'll push the fix to the __atomic_float constructor, but with > a simplified test case that doesn't include the > compare_exchange_{weak,strong} calls, until I figure out how to fix > it.