PR100266]

Palmer Dabbelt Tue, 11 Oct 2022 17:15:56 -0700

On Tue, 11 Oct 2022 16:31:25 PDT (-0700), Vineet Gupta wrote:



On 10/11/22 13:46, Christoph Müllner wrote:

On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <pal...@dabbelt.com> wrote:

    On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
    > Hi Christoph, Kito,
    >
    > On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
    >> This series provides a cleanup of the current atomics
    implementation
    >> of RISC-V:
    >>
    >> * PR100265: Use proper fences for atomic load/store
    >> * PR100266: Provide programmatic implementation of CAS
    >>
    >> As both are very related, I merged the patches into one series.
    >>
    >> The first patch could be squashed into the following patches,
    >> but I found it easier to understand the chances with it in place.
    >>
    >> The series has been tested as follows:
    >> * Building and testing a multilib RV32/64 toolchain
    >>    (bootstrapped with riscv-gnu-toolchain repo)
    >> * Manual review of generated sequences for GCC's atomic
    builtins API
    >>
    >> The programmatic re-implementation of CAS benefits from a REE
    improvement
    >> (see PR100264):
    >> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
    >> If this patch is not in place, then an additional extension
    instruction
    >> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
    >>
    >> Further, the new CAS code requires cbranch INSN helpers to be
    present:
    >> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
    >
    > I was wondering is this patchset is blocked on some technical
    grounds.

    There's a v3 (though I can't find all of it, so not quite sure what
    happened), but IIUC that still has the same fundamental problems that
    all these have had: changing over to the new fence model may by an
    ABI
    break and the split CAS implementation doesn't ensure eventual
    success
    (see Jim's comments).  Not sure if there's other comments floating
    around, though, that's just what I remember.


v3 was sent on May 27, 2022, when I rebased this on an internal tree:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html

I dropped the CAS patch in v3 (issue: stack spilling under extremeregister pressure instead of erroring out) as I thought that this wasthe blocker for the series.I just learned a few weeks ago, when I asked Palmer at the GNUCauldron about this series, that the ABI break is the blocker.

Yeah I was confused about the ABI aspect as I didn't see any mention ofthat in the public reviews of v1 and v2.

Sorry, I thought we'd talked about it somewhere but it must have justbeen in meetings and such. Patrick was writing a similar patch setaround the same time so it probably just got tied up in that, we endedup reducing it to just the strong CAS inline stuff because we couldn'tsort out the correctness of the rest of it.

My initial understanding was that fixing something broken cannot be anABI break.And that the mismatch of the implementation in 2021 and therecommended mappings in the ratified specification from 2019 issomething that is broken. I still don't know the background here, butI guess this assumption is incorrect from a historical point of view.

We agreed that we wouldn't break binaries back when we submitted theport. The ISA has changed many times since then, including adding therecommended mappings, but those binaries exist and we can't justsilently break things for users.

However, I'm sure that I am not the only one that assumes the mappingsin the specification to be implemented in compilers and tools.Therefore I still consider the implementation of the RISC-V atomics inGCC as broken (at least w.r.t. user expectation from people that lackthe historical background and just read the RISC-V specification).

You can't just read one of those RISC-V PDFs and assume thatimplementations that match those words will function correctly. Thosewords regularly change in ways where reasonable readers would end upwith incompatible implementations due to those differences. That's whywe're so explicit about versions and such these days, we're just gettingburned by these old mappings because they're from back when we thoughthe RISC-V definition of compatibility was going to match the morecommon one and we didn't build in fallbacks.

    +Andrea, in case he has time to look at the memory model / ABI
    issues.

    We'd still need to sort out the CAS issues, though, and it's not
    abundantly clear it's worth the work: we're essentailly
    constrained to
    just emitting those fixed CAS sequences due to the eventual success
    rules, so it's not clear what the benefit of splitting those up is.
    With WRS there are some routines we might want to generate code for
    (cond_read_acquire() in Linux, for example) but we'd really need
    to dig
    into those to see if it's even sane/fast.

    There's another patch set to fix the lack of inline atomic routines
    without breaking stuff, there were some minor comments from Kito and
    IIRC I had some test failures that I needed to chase down as well.
    That's a much safer fix in the short term, we'll need to deal with
    this
    eventually but at least we can stop the libatomic issues for the
    distro
    folks.
I expect that the pressure for a proper fix upstream (instead of abackward compatible compromise) will increase over time (once peoplestart building big iron based on RISC-V and start hunting performancebottlenecks in multithreaded workloads to be competitive).What could be done to get some relief is to enable the new atomics ABIby a command line switch and promote its use. And at one point in thefuture (if there are enough fixes to justify a break) the new ABI canbe enabled by default with a new flag to enable the old ABI.
Indeed we are stuck with inefficiencies with status quo. The new abioption sounds like a reasonable plan going fwd.

I don't think we're just stuck with the status quo, we really just needto go through the mappings and figure out which can be made both fastand ABI-compatible. Then we can fix those and see where we stand, maybeit's good enough or maybe we need to introduce some sort ofcompatibility break to make things faster (and/or compatible with LLVM,where I suspect we're broken right now).

If we do need a break then I think it's probably possible to do it inphases, where we have a middle-ground compatibility mode that works forboth the old and new mappings so distros can gradually move over as theyrebuild packages.

Issues like the libstdc++ shared_ptr/mutex fallback don't map well tothat, though. There's also some stuff like the IO fence bits that wecan probably just add an argument for now, those were likely just a badidea at the time and should be safe to turn off for the vast majority ofusers (though those are more of an API break).

Also my understand is that while the considerations are ABI centric, theoption to faciliate this need not be tied to canonical -mabi=lp32, lp64detc. It might just be a toggle as -matomic=legacy,2019 etc (this is notsuggestive just indicative). Otherwise there's another level of blowupin multilib testing etc.

The psABI doesn't mention memory ordering at all. IIUC that's a prettystandard hole in psABI documents, but it means we're in a grey areahere.

+Jeff, who was offering to help when the threads got crossed. I'dpunted on a lot of this in the hope Andrea could help out, as I'm notreally a memory model guy and this is pretty far down the rabbit hole.Happy to have the help if you're offering, though, as what's there islikely a pretty big performance issue for anyone with a reasonablememory system.

-Vineet

Re: [PATCH v2 00/10] [RISC-V] Atomics improvements [PR100265/PR100266]

Reply via email to