Hi Jonathan, Gentle ping.
Thanks, Soumya > On 5 Dec 2025, at 2:46 PM, Soumya AR <[email protected]> wrote: > > Hi Jonathan, > > Pinging again for some clarification on what is the ideal way to implement > atomic min/max in libstdc++. > > Would you prefer we check for the builtins using concepts (similar to fp > add/sub)? If so, the earlier questions in this thread still apply. > > If not, since I do have a patch that implements the builtins in GCC, would you > prefer the builtins be called directly, without a concept check? > > Patch is at: > https://gcc.gnu.org/pipermail/gcc-patches/2025-November/700786.html > > Please let me know if the general approach for this is OK. > > Thanks, > Soumya > > >> On 11 Nov 2025, at 4:31 PM, Soumya AR <[email protected]> wrote: >> >> Hi Jakub and Jonathan, >> >> Pinging on this again to ask for confirmation. >> >> As Matthew mentioned earlier, does it sound OK to have patches in this >> order: >> >>> - Have builtins used in libstdc++ (under SFINAE checks). >>> - Have builtins defined and emitting correct code for AArch64. >>> - Add the CAS pattern matching into an IFN. >> >> We'd like to continue with the 2nd option that Jakub suggested — lowering the >> generic builtin to an IFN, and converting that to a CAS loop after IPA if >> the backend does not support a direct optab. >> >> --- >> >> As for changes on the libstdc++ side, since we cannot emit a direct call to >> the >> builtins and need to resolve these appropriately under SFINAE, how do we >> handle >> implemnting the SFINAE check for __atomic_base? >> >> I would like to ideally emit less calls to the compiler builtins and thus >> have >> a single check for the builtin, the way it is currently implemented in >> __atomic_impl. >> >> What would be a neater way to tackle this? >> >> - Forward declaration of just the min/max functions from __atomic_impl and >> resolve them using the __atomic_impl implementation for __atomic_base >> - Move the SFINAE checks outside __atomic_impl. This messes with the current >> philosophy of __atomic_impl though, as I understand it >> - Resolve the integer types using a separate SFINAE check. Would result in >> code >> duplication though >> - Any other suggestion I'm missing? >> >> Lastly, how do we guard these for C++26? >> >> Thanks, >> Soumya >> >>> On 7 Nov 2025, at 4:13 PM, Matthew Malcomson <[email protected]> wrote: >>> >>> >>> >>> On 11/4/25 12:43, Jakub Jelinek wrote: >>>> External email: Use caution opening links or attachments >>>> On Tue, Nov 04, 2025 at 11:33:05AM +0000, Matthew Malcomson wrote: >>>>> We've been meaning to send this email for a while after the Cauldron. >>>>> IIUC you discussed this with Ramana there -- my understanding of what he >>>>> told me is that your main concern is with the explosion of builtins. >>>> Yeah. >>>>> Similarly, what I understand is that if we reduce the number of builtins >>>>> to >>>>> just the `__atomic_fetch_min`, `__atomic_fetch_max`, `__atomic_min_fetch` >>>>> and `__atomic_max_fetch` (which are the same builtins currently exposed by >>>>> clang) that seems more acceptable. >>>>> >>>>> I wanted to double-check that this understanding of a conversation I'm >>>>> hearing second-hand is correct. >>>> Yes. Even min vs. max could be reduced to minmax with an extra argument >>>> which determines which one it is, but I'm fine with those 4. >>>> But just those type generic ones, not the _{1,2,4,8,16} suffixes thereof >>>> etc., let the compiler figure out the signedness and size (and will it >>>> handle floating point too or not?). >>> >>> Great, sticking with those four works for us (and matches what clang has >>> already implemented). >>> >>> Yes -- we expect to make these four work for floating point too. >>> >>>>> Also I'd like to gather opinion (from you but also anyone who has an >>>>> opinion) about the following implementation design decisions: >>>>> 1) Should we still implement the libatomic functions? And still with the >>>>> unsigned/signed distinction and all sizes? >>>>> - I'd expect so, mostly for the `-fno-inline-atomics` flag. >>>> Do you really need that? Can't you just emit a CAS loop for the non-inline >>>> atomics? Because the function explosion will be there again on the >>>> libatomic side. Or at least don't add such symbols on arches which are >>>> never going to benefit from those right now (i.e. all the implementations >>>> would be CAS loops anyway). Some function explosion can be limited by only >>>> having the __atomic_fetch_{min,max}_{s,u}{1,2,4,8,16} variants and not >>>> the {min,max}_fetch ones, because that case can be implemented on the >>>> caller >>>> side by just returning the DESIRED argument separately. >>> >>> We don't need libatomic -- we were suggesting implementing it on the guess >>> that others would like it but are happy to drop that work and emit a CAS >>> loop from the compiler instead. >>> >>>>> 2) Earlier you floated the idea of using an internal function to encode >>>>> the >>>>> operation through the rest of the compiler (outside of the frontend). >>>>> Does >>>>> that approach still seem good to you? >>>> There are 2 options. >>>> Lower the type-generic builtin into a CAS loop and pattern recognize it at >>>> some late time (e.g. the widening_mul pass, certainly after IPA) into an >>>> IFN >>>> if the corresponding optab is supported. >>>> Or lower the type-generic builtin into IFN (ifns can have the min vs. max >>>> argument and derive size and sign from the DESIRED argument) and at some >>>> perhaps early (before IPA) point - forwprop? - pattern match a CAS loop >>>> into >>>> the IFN too and then ideally shortly after IPA lower the IFN back into a >>>> CAS >>>> loop if optab doesn't exist. >>>> The reason for the pre vs. post-IPA is OpenMP/OpenACC, before IPA you don't >>>> always know what the backend will be. >>> >>> Ah -- I did not know about this OpenMP/OpenACC case to watch out for -- >>> thanks for raising it. I think that shows my implementation of the CAS >>> loop expansion in my floating point fetch add/sub patches to be >>> problematic. (I had put it in the frontend because the decision about how >>> to handle floating point exceptions etc seemed a frontend-specific >>> decision). >>> >>> ------ >>> Personally I like sound of lowering to an IFN before expanding it later on >>> the principle that if we do start to miss anything with our pattern match >>> we can at least ensure we don't miss anything the user explicitly provided >>> as a fetch_min/fetch_max. >>> (I.e. the second option you suggested). >>> >>> ------ >>> For clarity, in order to get some benefit in GCC 15 we have the following >>> goals (ordered by increasing amount of code changes we need to write, also >>> somewhat corresponding to how much we're aiming to get that in by GCC 15): >>> - Have builtins used in libstdc++ (under SFINAE checks). >>> - Have builtins defined and emitting correct code for AArch64. >>> - Add the CAS pattern matching into an IFN. >>> >>> Based on the above preferences we'd like to see if you'd be happy with the >>> first patchset to not include the CAS pattern matching. >>> - N.b. This assumes going with the second option you suggested of expanding >>> the IFN into a CAS loop if optab doesn't exist rather than pattern matching >>> if optab does exist. Hence if you'd want the first option do mention. >>> >>> Thanks again, >>> Matthew >> >> >
