Committed with Palmer's suggestions for the commit message, also I plan to back port that to 11, 12 and 13 release branches :)
On Thu, Feb 29, 2024 at 4:27 AM Palmer Dabbelt <pal...@dabbelt.com> wrote: > > On Wed, 28 Feb 2024 09:36:38 PST (-0800), Patrick O'Neill wrote: > > > > On 2/28/24 07:02, Palmer Dabbelt wrote: > >> On Wed, 28 Feb 2024 06:57:53 PST (-0800), jeffreya...@gmail.com wrote: > >>> > >>> > >>> On 2/28/24 05:23, Kito Cheng wrote: > >>>> atomic_compare_and_swapsi will use lr.w and sc.w to do the atomic > >>>> operation on > >>>> RV64, however lr.w is doing sign extend to DI and compare > >>>> instruction only have > >>>> DI mode on RV64, so the expected value should be sign extend before > >>>> compare as > >>>> well, so that we can get right compare result. > >>>> > >>>> gcc/ChangeLog: > >>>> > >>>> PR target/114130 > >>>> * config/riscv/sync.md (atomic_compare_and_swap<mode>): Sign > >>>> extend the expected value if needed. > >>>> > >>>> gcc/testsuite/ChangeLog: > >>>> > >>>> * gcc.target/riscv/pr114130.c: New. > >>> Nearly rejected this as I think the description was a bit ambiguous and > >>> I thought you were extending the result of the lr.w. But it's actually > >>> the other value you're ensuring gets properly extended. > >> > >> I had the same response, but after reading it I'm not quite sure how > >> to say it better. > > Maybe something like > > atomic_compare_and_swapsi will use lr.w to do obtain the original value, > which sign extends to DI. RV64 only has DI comparisons, so we also need > to sign extend the expected value to DI as otherwise the comparison will > fail when the expected value has the 32nd bit set. > > would do it? Either way > > Reviewed-by: Palmer Dabbelt <pal...@rivosinc.com> > > as I've managed to convince myself it's correct. We should probably > backport this one, the bug has likely been around for a while. > > >> > >>> OK. > >> > >> I was looking at the code to try and ask if we have the same bug for > >> the short inline CAS routines, but I've got to run to some meetings... > > > > I don't think subword AMO CAS is impacted. > > > > As part of the CAS we mask both the expected value [2] and the retrieved > > value[1] before comparing. > > I'm always a bit lost when it comes to bit arithmetic, but I think it's > OK. It smells like it's being a little loose with the > extensions/comparisons, but just looking at some generated code for this > simple case: > > void foo(uint16_t *p, uint16_t *e, uint16_t *d) { > __atomic_compare_exchange(p, e, d, 0, __ATOMIC_RELAXED, > __ATOMIC_RELAXED); > } > > foo: > lhu a3,0(a2) > lhu a2,0(a1) > andi a4,a0,3 > li a5,65536 > slliw a4,a4,3 > addiw a5,a5,-1 > sllw a5,a5,a4 > sllw a3,a3,a4 > sllw a7,a2,a4 > andi a0,a0,-4 > and a3,a3,a5 > not t1,a5 > and a7,a7,a5 > 1: > lr.w a6, 0(a0) > and t3, a6, a5 // Both a6 (from the lr.w) and a5 > // (from the sllw) are sign extended, > // so the result in t3 is sign extended. > bne t3, a7, 1f // a7 is also sign extended (before > // and after the masking above), so > // it's safe for comparison > and t3, a6, t1 > or t3, t3, a3 > sc.w t3, t3, 0(a0) // The top bits of t3 end up polluted > // with sign extension, but it doesn't > // matter because of the sc.w. > bnez t3, 1b > 1: > sraw a6,a6,a4 > slliw a2,a2,16 > slliw a5,a6,16 > sraiw a2,a2,16 > sraiw a5,a5,16 > subw a5,a5,a2 > beq a5,zero,.L1 > sh a6,0(a1) > .L1: > ret > > So I think we're OK -- that masking of a7 looks redundant here, but I > don't think we could get away with just > > diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md > index 54bb0a66518..15956940032 100644 > --- a/gcc/config/riscv/sync.md > +++ b/gcc/config/riscv/sync.md > @@ -456,7 +456,6 @@ (define_expand "atomic_cas_value_strong<mode>" > riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o); > riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n); > > - emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask)); > emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask)); > > enum memmodel model_success = (enum memmodel) INTVAL (operands[4]); > > because we'd need the masking for when we don't know the high bits are > safe pre-shift. So maybe some sort of simplify call could help out > there, but I bet it's not really worth bothering -- the bookeeping > doesn't generally matter that much around AMOs. > > > - Patrick > > > > [1]: > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/sync.md;h=54bb0a66518ae353fa4ed640339213bf5da6682c;hb=refs/heads/master#l495 > > [2]: > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/sync.md;h=54bb0a66518ae353fa4ed640339213bf5da6682c;hb=refs/heads/master#l459 > > > >> > >>> > >>> Jeff