On Fri, 16 Feb 2024 at 12:38, Jonathan Wakely <jwak...@redhat.com> wrote:
>
> On Fri, 2 Feb 2024 at 16:52, xndcn <xnd...@gmail.com> wrote:
> >
> > Thank you for your careful review!
> >
> > > But we don't need a new one if it's going to be used in exactly one test 
> > > and if the new option does the same thing for all targets that run the 
> > > test.
> > Got it, thanks. Now add option "-latomic" directly, but it still rely
> > on the trick "[atomic_link_flags [get_multilibs]]"
> >
> > > No, because the patch is supposed to prevent the infinite loop, and so 
> > > there's no need to stop it looping after 10s. It won't loop at all.
> > Thanks, deleted.
> >
> > > We only need to clear padding for long double, not float and double, 
> > > right?
> > Yes, actually there is a check "if constexpr
> > (__atomic_impl::__maybe_has_padding<_Fp>())".
> > But "__atomic_impl::__clear_padding(_M_fp); " is indeed simply, so fixed 
> > here.
> >
> > > Why can't we run this on all targets?
> > Got it, now target option deleted.
> >
> > > There's no reason to use __builtin_memset here, just include <cstring> 
> > > and use std::memcpy.
> > Thanks, fixed.
> >
> > > It definitely does have padding, just say "long double has padding bits 
> > > on x86"
> > Thanks, fixed.
> >
> > So here comes the latest patch:
>
>
> Thanks. I've applied the patch to my tree, but the new test fails
> pretty reliably.
>
> The infinite loop in std::atomic<long double>::fetch_add is fixed by
> clearing padding in the constructor, but the test fails on the
> compare_exchange_weak or compare_exchange_strong lines here:
>
>
> > +    as.fetch_add(t);
> > +    long double s = f + t;
> > +    t = as.load();
> > +    VERIFY(s == t); // padding ignored on float comparing
> > +    fill_padding(s);
> > +    VERIFY(as.compare_exchange_weak(s, f)); // padding cleared on cmpexchg
> > +    fill_padding(f);
> > +    VERIFY(as.compare_exchange_strong(f, t)); // padding cleared on 
> > cmpexchg
> >
>
>
> I think the problem is here in __atomic_impl::__compare_exchange:
>
>    if (__atomic_compare_exchange(__pval, __pexp, __pi,
>                                           __is_weak, int(__s), int(__f)))
>      return true;
>
> Even though padding in *__pexp and *__pi has been cleared, the value
> of *__pval after a successful __atomic_compare_exchange has non-zero
> padding. That means that the next compare_exchange will fail, because
> we assume that the stored value always has zeroed padding bits.
>
> Here's a gdb session showing that __atomic_compare_exchange stores a
> value with non-zero padding:
>
> Breakpoint 2, test01 () at compare_exchange_padding.cc:43
> 43        long double s2 = s;
> (gdb) n
> 44        fill_padding(s2);
> (gdb)
> 45        while (!as.compare_exchange_weak(s2, f)) // padding cleared
> on compexchg
> (gdb) p/x as._M_fp
> $11 = 0x40008000000000000000
> (gdb) step
> std::__atomic_float<long double>::compare_exchange_weak
> (this=0x7fffffffd8c0, __expected=@0x7fffffffd8a0: 2, __desired=0.5,
>     __order=std::memory_order::seq_cst) at
> /home/jwakely/gcc/14/include/c++/14.0.1/bits/atomic_base.h:1387
> 1387            return compare_exchange_weak(__expected, __desired, __order,
> (gdb) step
> std::__atomic_float<long double>::compare_exchange_weak
> (this=0x7fffffffd8c0, __expected=@0x7fffffffd8a0: 2, __desired=0.5,
>     __success=std::memory_order::seq_cst,
> __failure=std::memory_order::seq_cst) at
> /home/jwakely/gcc/14/include/c++/14.0.1/bits/atomic_base.h:1347
> 1347            return __atomic_impl::compare_exchange_weak(&_M_fp,
> (gdb) step
> std::__atomic_impl::compare_exchange_weak<false, long double>
> (__check_padding=false, __failure=std::memory_order::seq_cst,
>     __success=std::memory_order::seq_cst, __desired=0.5,
> __expected=@0x7fffffffd8a0: 2, __ptr=0x7fffffffd8c0)
>     at /home/jwakely/gcc/14/include/c++/14.0.1/bits/atomic_base.h:1123
> 1123            return __atomic_impl::__compare_exchange<_AtomicRef>(
> (gdb)
> std::__atomic_impl::__compare_exchange<false, long double>
> (__f=std::memory_order::seq_cst, __s=std::memory_order::seq_cst,
> __is_weak=true,
>     __i=<optimized out>, __e=@0x7fffffffd8a0: 2,
> __val=@0x7fffffffd8c0: 2) at
> /home/jwakely/gcc/14/include/c++/14.0.1/bits/atomic_base.h:994
> 994             __glibcxx_assert(__is_valid_cmpexch_failure_order(__f));
> (gdb) n
> 997             _Tp* const __pval = std::__addressof(__val);
> (gdb)
> 1008                _Vp* const __pi = __atomic_impl::__clear_padding(__i);
> (gdb)
> 1010                _Vp __exp = __e;
> (gdb)
> 1012                _Vp* const __pexp = __atomic_impl::__clear_padding(__exp);
> (gdb)
> 1016                if (__atomic_compare_exchange(__pval, __pexp, __pi,
> (gdb) p/x *__pval
> $12 = 0x40008000000000000000
> (gdb) p/x *__pexp
> $13 = 0x40008000000000000000
> (gdb) p/x *__pi
> $14 = 0x3ffe8000000000000000
> (gdb) n
> 1018                  return true;
> (gdb) p/x *__pval
> $15 = 0x7ffff7bf3ffe8000000000000000
> (gdb)
>
> We stored *__pi which has zero padding, but the result in *__pval has
> non-zero padding. This doesn't seem to be gdb being misleading by
> loading *__pval into a FP register which doesn't preserve the zero
> padding, because if I do this then it fails:
>
>   as.fetch_add(t);
>   VERIFY(as.load() == s);
>   __builtin_clear_padding(&s);
>   VERIFY( std::memcmp(&s, &as, sizeof(s)) == 0 );
>
> So the value stored by fetch_add (which uses compare_exchange_weak in
> a loop) really doesn't have zero padding bits.
>
> I'm not sure what's going on here yet, maybe __atomic_compare_exchange
> recognizes that the pointers are long double* and so only copies the
> non-padding bits into the value?

Ah, although __atomic_compare_exchange only takes pointers, the
compiler replaces that with a call to __atomic_compare_exchange_n
which takes the newval by value, which presumably uses an 80-bit FP
register and so the padding bits become indeterminate again.

Maybe we can make std::atomic<long double> actually store a 16-byte
struct containing an opaque char buffer, which we store a long double
into. That should stop FP registers being used for it. We never need
an actual long double, because we only access the value using the
__atomic_xxx built-ins.

Maybe we need that for every std::atomic<T> where T has padding bits,
because even for struct S { int i; char c; } passing S by value could
be scalarized and only copy the value representation, not preserve the
padding that we've carefully cleared.



>
> I think I'll push the fix to the __atomic_float constructor, but with
> a simplified test case that doesn't include the
> compare_exchange_{weak,strong} calls, until I figure out how to fix
> it.

Reply via email to