Committed with Palmer's suggestions for the commit message, also I
plan to back port that to 11, 12 and 13 release branches :)

On Thu, Feb 29, 2024 at 4:27 AM Palmer Dabbelt <pal...@dabbelt.com> wrote:
>
> On Wed, 28 Feb 2024 09:36:38 PST (-0800), Patrick O'Neill wrote:
> >
> > On 2/28/24 07:02, Palmer Dabbelt wrote:
> >> On Wed, 28 Feb 2024 06:57:53 PST (-0800), jeffreya...@gmail.com wrote:
> >>>
> >>>
> >>> On 2/28/24 05:23, Kito Cheng wrote:
> >>>> atomic_compare_and_swapsi will use lr.w and sc.w to do the atomic
> >>>> operation on
> >>>> RV64, however lr.w is doing sign extend to DI and compare
> >>>> instruction only have
> >>>> DI mode on RV64, so the expected value should be sign extend before
> >>>> compare as
> >>>> well, so that we can get right compare result.
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>>     PR target/114130
> >>>>     * config/riscv/sync.md (atomic_compare_and_swap<mode>): Sign
> >>>>     extend the expected value if needed.
> >>>>
> >>>> gcc/testsuite/ChangeLog:
> >>>>
> >>>>     * gcc.target/riscv/pr114130.c: New.
> >>> Nearly rejected this as I think the description was a bit ambiguous and
> >>> I thought you were extending the result of the lr.w.  But it's actually
> >>> the other value you're ensuring gets properly extended.
> >>
> >> I had the same response, but after reading it I'm not quite sure how
> >> to say it better.
>
> Maybe something like
>
>     atomic_compare_and_swapsi will use lr.w to do obtain the original value,
>     which sign extends to DI.  RV64 only has DI comparisons, so we also need
>     to sign extend the expected value to DI as otherwise the comparison will
>     fail when the expected value has the 32nd bit set.
>
> would do it?  Either way
>
> Reviewed-by: Palmer Dabbelt <pal...@rivosinc.com>
>
> as I've managed to convince myself it's correct.  We should probably
> backport this one, the bug has likely been around for a while.
>
> >>
> >>> OK.
> >>
> >> I was looking at the code to try and ask if we have the same bug for
> >> the short inline CAS routines, but I've got to run to some meetings...
> >
> > I don't think subword AMO CAS is impacted.
> >
> > As part of the CAS we mask both the expected value [2] and the retrieved
> > value[1] before comparing.
>
> I'm always a bit lost when it comes to bit arithmetic, but I think it's
> OK.  It smells like it's being a little loose with the
> extensions/comparisons, but just looking at some generated code for this
> simple case:
>
>     void foo(uint16_t *p, uint16_t *e, uint16_t *d) {
>         __atomic_compare_exchange(p, e, d, 0, __ATOMIC_RELAXED, 
> __ATOMIC_RELAXED);
>     }
>
>     foo:
>             lhu     a3,0(a2)
>             lhu     a2,0(a1)
>             andi    a4,a0,3
>             li      a5,65536
>             slliw   a4,a4,3
>             addiw   a5,a5,-1
>             sllw    a5,a5,a4
>             sllw    a3,a3,a4
>             sllw    a7,a2,a4
>             andi    a0,a0,-4
>             and     a3,a3,a5
>             not     t1,a5
>             and     a7,a7,a5
>             1:
>             lr.w    a6, 0(a0)
>             and     t3, a6, a5    // Both a6 (from the lr.w) and a5
>                                   // (from the sllw) are sign extended,
>                                   // so the result in t3 is sign extended.
>             bne     t3, a7, 1f    // a7 is also sign extended (before
>                                   // and after the masking above), so
>                                   // it's safe for comparison
>             and     t3, a6, t1
>             or      t3, t3, a3
>             sc.w    t3, t3, 0(a0) // The top bits of t3 end up polluted
>                                   // with sign extension, but it doesn't
>                                   // matter because of the sc.w.
>             bnez    t3, 1b
>             1:
>             sraw    a6,a6,a4
>             slliw   a2,a2,16
>             slliw   a5,a6,16
>             sraiw   a2,a2,16
>             sraiw   a5,a5,16
>             subw    a5,a5,a2
>             beq     a5,zero,.L1
>             sh      a6,0(a1)
>     .L1:
>             ret
>
> So I think we're OK -- that masking of a7 looks redundant here, but I
> don't think we could get away with just
>
>     diff --git a/gcc/config/riscv/sync.md b/gcc/config/riscv/sync.md
>     index 54bb0a66518..15956940032 100644
>     --- a/gcc/config/riscv/sync.md
>     +++ b/gcc/config/riscv/sync.md
>     @@ -456,7 +456,6 @@ (define_expand "atomic_cas_value_strong<mode>"
>        riscv_lshift_subword (<MODE>mode, o, shift, &shifted_o);
>        riscv_lshift_subword (<MODE>mode, n, shift, &shifted_n);
>
>     -  emit_move_insn (shifted_o, gen_rtx_AND (SImode, shifted_o, mask));
>        emit_move_insn (shifted_n, gen_rtx_AND (SImode, shifted_n, mask));
>
>        enum memmodel model_success = (enum memmodel) INTVAL (operands[4]);
>
> because we'd need the masking for when we don't know the high bits are
> safe pre-shift.  So maybe some sort of simplify call could help out
> there, but I bet it's not really worth bothering -- the bookeeping
> doesn't generally matter that much around AMOs.
>
> > - Patrick
> >
> > [1]:
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/sync.md;h=54bb0a66518ae353fa4ed640339213bf5da6682c;hb=refs/heads/master#l495
> > [2]:
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/config/riscv/sync.md;h=54bb0a66518ae353fa4ed640339213bf5da6682c;hb=refs/heads/master#l459
> >
> >>
> >>>
> >>> Jeff

Reply via email to