On Wed, Mar 4, 2026 at 3:34 AM Trevor Gross <[email protected]> wrote: > > Hello all, > > I am interested in revisiting the return ABI of _Float16 on i386. > Currently it is returned in xmm0, meaning SSE is required for the type. > This is rather inconvenient when _Float16 is otherwise quite well > supported. Compilers need to pick between hacking together a custom ABI > that works on the baseline, or passing the burden on to users to gate > everything. > > Is there any interest in adjusting the specification such that _Float16 > is returned in a GPR rather than SSE?
Changing ABIs at anytime is wrong. Why not change rust to follow the ABI? Why not have fp16 as a conditionally supported feature in rust like any other language? Changing the ABI requires multilib or a flag day. And I doubt distros want either of those at this stage; especially for x86 32bit which had a stable ABI for the last 20+ years. Thanks, Andrew > > This was brought up before in the thread at [1], with the concern about > efficient 16-bit moves between GPRs or memory and XMM. This doesn't seem > to be relevant, however, given there isn't any reason to have a _Float16 > in XMM unless F16C is available, implying SSE2 and SSE4.1 for PINSRW and > PEXTRW to/from memory (unless I am missing something?). > > A sample patch to the psABI is below. Needless to say there are > compatibility concerns that come from a change but given workarounds > already exist (e.g. in LLVM), it seems worth considering whether > something should be codefied to make this simpler for everyone. > > Best regards, > Trevor > > [1]: > https://inbox.sourceware.org/gcc-patches/[email protected]/ > > (some CCs added from the linked discussion) > > --- patch follows --- > > From 1af72db89f9a10b93569fa0b9f64f65f2dd73334 Mon Sep 17 00:00:00 2001 > From: Trevor Gross <[email protected]> > Date: Fri, 23 Jan 2026 21:11:43 +0000 > Subject: [PATCH] Return _Float16 and _Complex _Float16 in GPRs > > Currently the ABI specifies that _Float16 is to be passed on the stack > and returned in xmm0, meaning SSE is required to support the type. > Adjust both _Float16 and _Complex _Float16 to return in eax, dropping > the SSE requirement. > > This has the benefit of making _Float16 ABI-compatible with `short`. > --- > low-level-sys-info.tex | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex > index 0015c8c..a2d8d6d 100644 > --- a/low-level-sys-info.tex > +++ b/low-level-sys-info.tex > @@ -384,8 +384,7 @@ of some 64bit return types & No \\ > \ESI & callee-saved register & yes \\ > \EDI & callee-saved register & yes \\ > \reg{xmm0} & scratch register; also used to pass the first \code{__m128} > - parameter and return \code{__m128}, \code{_Float16}, > - \code{_Complex _Float16} & No \\ > + parameter and return \code{__m128} & No \\ > \reg{ymm0} & scratch register; also used to pass the first \code{__m256} > parameter and return \code{__m256} & No \\ > \reg{zmm0} & scratch register; also used to pass the first \code{__m512} > @@ -472,7 +471,11 @@ and \texttt{unions}) are always returned in memory. > & \texttt{\textit{any-type} *} & \EAX \\ > & \texttt{\textit{any-type} (*)()} & \\ > \hline > - & \texttt{_Float16} & \reg{xmm0} \\ > + & \texttt{_Float16} & \reg{ax} \\ > + & & The upper 16 bits of \EAX are undefined. > + The caller must not \\ > + & & rely on these being set in a predefined > + way by the called function. \\ > \cline{2-3} > & \texttt{float} & \reg{st0} \\ > \cline{2-3} > @@ -484,7 +487,7 @@ and \texttt{unions}) are always returned in memory. > \cline{2-3} > & \texttt{__float128} & memory \\ > \hline > - & \texttt{_Complex _Float16} & \reg{xmm0} \\ > + & \texttt{_Complex _Float16} & \reg{eax} \\ > & & The real part is returned in bits 0..15. The imaginary part is > returned \\ > & & in bits 16..31.\\ > -- > 2.50.1 (Apple Git-155)
