On 10/11/22 13:46, Christoph Müllner wrote:
On Tue, Oct 11, 2022 at 9:31 PM Palmer Dabbelt <pal...@dabbelt.com> wrote:
On Tue, 11 Oct 2022 12:06:27 PDT (-0700), Vineet Gupta wrote:
> Hi Christoph, Kito,
>
> On 5/5/21 12:36, Christoph Muellner via Gcc-patches wrote:
>> This series provides a cleanup of the current atomics
implementation
>> of RISC-V:
>>
>> * PR100265: Use proper fences for atomic load/store
>> * PR100266: Provide programmatic implementation of CAS
>>
>> As both are very related, I merged the patches into one series.
>>
>> The first patch could be squashed into the following patches,
>> but I found it easier to understand the chances with it in place.
>>
>> The series has been tested as follows:
>> * Building and testing a multilib RV32/64 toolchain
>> (bootstrapped with riscv-gnu-toolchain repo)
>> * Manual review of generated sequences for GCC's atomic
builtins API
>>
>> The programmatic re-implementation of CAS benefits from a REE
improvement
>> (see PR100264):
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568680.html
>> If this patch is not in place, then an additional extension
instruction
>> is emitted after the SC.W (in case of RV64 and CAS for uint32_t).
>>
>> Further, the new CAS code requires cbranch INSN helpers to be
present:
>> https://gcc.gnu.org/pipermail/gcc-patches/2021-May/569689.html
>
> I was wondering is this patchset is blocked on some technical
grounds.
There's a v3 (though I can't find all of it, so not quite sure what
happened), but IIUC that still has the same fundamental problems that
all these have had: changing over to the new fence model may by an
ABI
break and the split CAS implementation doesn't ensure eventual
success
(see Jim's comments). Not sure if there's other comments floating
around, though, that's just what I remember.
v3 was sent on May 27, 2022, when I rebased this on an internal tree:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595712.html
I dropped the CAS patch in v3 (issue: stack spilling under extreme
register pressure instead of erroring out) as I thought that this was
the blocker for the series.
I just learned a few weeks ago, when I asked Palmer at the GNU
Cauldron about this series, that the ABI break is the blocker.
Yeah I was confused about the ABI aspect as I didn't see any mention of
that in the public reviews of v1 and v2.
My initial understanding was that fixing something broken cannot be an
ABI break.
And that the mismatch of the implementation in 2021 and the
recommended mappings in the ratified specification from 2019 is
something that is broken. I still don't know the background here, but
I guess this assumption is incorrect from a historical point of view.
However, I'm sure that I am not the only one that assumes the mappings
in the specification to be implemented in compilers and tools.
Therefore I still consider the implementation of the RISC-V atomics in
GCC as broken (at least w.r.t. user expectation from people that lack
the historical background and just read the RISC-V specification).
+Andrea, in case he has time to look at the memory model / ABI
issues.
We'd still need to sort out the CAS issues, though, and it's not
abundantly clear it's worth the work: we're essentailly
constrained to
just emitting those fixed CAS sequences due to the eventual success
rules, so it's not clear what the benefit of splitting those up is.
With WRS there are some routines we might want to generate code for
(cond_read_acquire() in Linux, for example) but we'd really need
to dig
into those to see if it's even sane/fast.
There's another patch set to fix the lack of inline atomic routines
without breaking stuff, there were some minor comments from Kito and
IIRC I had some test failures that I needed to chase down as well.
That's a much safer fix in the short term, we'll need to deal with
this
eventually but at least we can stop the libatomic issues for the
distro
folks.
I expect that the pressure for a proper fix upstream (instead of a
backward compatible compromise) will increase over time (once people
start building big iron based on RISC-V and start hunting performance
bottlenecks in multithreaded workloads to be competitive).
What could be done to get some relief is to enable the new atomics ABI
by a command line switch and promote its use. And at one point in the
future (if there are enough fixes to justify a break) the new ABI can
be enabled by default with a new flag to enable the old ABI.
Indeed we are stuck with inefficiencies with status quo. The new abi
option sounds like a reasonable plan going fwd.
Also my understand is that while the considerations are ABI centric, the
option to faciliate this need not be tied to canonical -mabi=lp32, lp64d
etc. It might just be a toggle as -matomic=legacy,2019 etc (this is not
suggestive just indicative). Otherwise there's another level of blowup
in multilib testing etc.
-Vineet