On Wednesday, 4 March 2026 15:09:55 Pacific Standard Time Trevor Gross wrote: > It indeed is not maximally efficient, but any `float` or `double` code > is already paying a similar (or slightly higher) cost for %st0 return > right? At least if any operations are done in XMM registers, which > Clang likes to do whenever SSE2 is available (or GCC with options).
Indeed, but that's an ABI issue now. There was no SSE when the ABI was created. Even just enforcing the stack alignment for SSE about 25 years ago was a problem. A compiler might be able to decide to use SSE or x87 depending on the cost of the transfer at the end. For most simple operations, x87 is as fast as SSE, but complex code won't be due to the use of stack-based registers. > The compatibility issues using XMM doesn't seem necessarily worth the > cycle savings specifically for _Float16, given the cost for other > floats at non-inlineable function boundaries. Especially when many ops > with the type require a f16<->f32 conversion, which itself doesn't > have the call overhead (if supported). I disagree. The cost is additional, regardless of how the implementation of FP16 code is done, except if it were done entirely emulated in SW. > > So there are two questions to be answered, one of which has already been: > > > > 1) does FP16 support require SSE? > > > > H.J. stated it does in the discussion you linked to and no one argued. > > I took Joseph's first reply on the thread to be an expression of some > disagreement, followed by discussion about efficient GPR<->XMM to > support a GPR return that didn't exactly come to a conclusion. But it > is possible I am misreading here, none of this is stated explicitly. I took that as a question, to which H.J. replied saying "it is" and no one argued. You seem to wish to reopen this discussion. > (Joseph's email address from that thread bounced, added a new one here.) The llvm-dev one too, so I dropped it. > At the ABI level the choice isn't between two performance optimization > goals, but rather between optimization and compatibility. The current > _Float16 ABI does lean toward optimization (as much as possible with > stack passing), but this makes it the only C-specificed type to not be > compatible with baseline i386. Indeed. But what's the harm? > > So I'd argue it's not worth optimising for them, and it's far better to > > allow the best performance when one has HW-backed conversion instructions > > (and for GCC, using -mfpmath=sse). > > This is a bit of a tangent but I think it would be much more useful to > have an ABI-changing flag that raises the baseline to SSE2 and returns > _Float16, float, and double in XMM. That gets the return ABI > performance improvement for all float types, not just _Float16, and > effectively resolves a whole class of issues for x86-32 users like > [1], [2], [3]. Doesn't sseregparm do that? https://i386.godbolt.org/z/Pxj6YM365 But the question here is whether we need an ABI-breaking option to be able to use _Float16 efficiently, given that the type itself was only introduced after SSE became existent. > > Are you asking to reopen the "requires SSE" discussion? > > That is my interest here, to the extent that is possible at this point. Ok, but why? While AVX is missing in some hardware still being sold, SSE has been present in everything for two decades, with the exception of the Intel Quark microcontroller (and that is no longer commercialised). Or am I missing something relevant that would be excluded from using _Float16? I suppose there are still people with old Pentium III or older systems still running, but are they updating software for them? Do they have a *new* need for _Float16? Software that *requires* _Float16 is incredibly rare, since that was a non-standard type before C++23. Instead, software that needs the type either has their own emulations they've deployed for years or they gracefully degrade, not requiring the type to compile. -- Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org Principal Engineer - Intel Data Center - Platform & Sys. Eng.
signature.asc
Description: This is a digitally signed message part.
