Hello all, I am interested in revisiting the return ABI of _Float16 on i386. Currently it is returned in xmm0, meaning SSE is required for the type. This is rather inconvenient when _Float16 is otherwise quite well supported. Compilers need to pick between hacking together a custom ABI that works on the baseline, or passing the burden on to users to gate everything.
Is there any interest in adjusting the specification such that _Float16 is returned in a GPR rather than SSE? This was brought up before in the thread at [1], with the concern about efficient 16-bit moves between GPRs or memory and XMM. This doesn't seem to be relevant, however, given there isn't any reason to have a _Float16 in XMM unless F16C is available, implying SSE2 and SSE4.1 for PINSRW and PEXTRW to/from memory (unless I am missing something?). A sample patch to the psABI is below. Needless to say there are compatibility concerns that come from a change but given workarounds already exist (e.g. in LLVM), it seems worth considering whether something should be codefied to make this simpler for everyone. Best regards, Trevor [1]: https://inbox.sourceware.org/gcc-patches/[email protected]/ (some CCs added from the linked discussion) --- patch follows --- >From 1af72db89f9a10b93569fa0b9f64f65f2dd73334 Mon Sep 17 00:00:00 2001 From: Trevor Gross <[email protected]> Date: Fri, 23 Jan 2026 21:11:43 +0000 Subject: [PATCH] Return _Float16 and _Complex _Float16 in GPRs Currently the ABI specifies that _Float16 is to be passed on the stack and returned in xmm0, meaning SSE is required to support the type. Adjust both _Float16 and _Complex _Float16 to return in eax, dropping the SSE requirement. This has the benefit of making _Float16 ABI-compatible with `short`. --- low-level-sys-info.tex | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index 0015c8c..a2d8d6d 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -384,8 +384,7 @@ of some 64bit return types & No \\ \ESI & callee-saved register & yes \\ \EDI & callee-saved register & yes \\ \reg{xmm0} & scratch register; also used to pass the first \code{__m128} - parameter and return \code{__m128}, \code{_Float16}, - \code{_Complex _Float16} & No \\ + parameter and return \code{__m128} & No \\ \reg{ymm0} & scratch register; also used to pass the first \code{__m256} parameter and return \code{__m256} & No \\ \reg{zmm0} & scratch register; also used to pass the first \code{__m512} @@ -472,7 +471,11 @@ and \texttt{unions}) are always returned in memory. & \texttt{\textit{any-type} *} & \EAX \\ & \texttt{\textit{any-type} (*)()} & \\ \hline - & \texttt{_Float16} & \reg{xmm0} \\ + & \texttt{_Float16} & \reg{ax} \\ + & & The upper 16 bits of \EAX are undefined. + The caller must not \\ + & & rely on these being set in a predefined + way by the called function. \\ \cline{2-3} & \texttt{float} & \reg{st0} \\ \cline{2-3} @@ -484,7 +487,7 @@ and \texttt{unions}) are always returned in memory. \cline{2-3} & \texttt{__float128} & memory \\ \hline - & \texttt{_Complex _Float16} & \reg{xmm0} \\ + & \texttt{_Complex _Float16} & \reg{eax} \\ & & The real part is returned in bits 0..15. The imaginary part is returned \\ & & in bits 16..31.\\ -- 2.50.1 (Apple Git-155)
