On Mon Sep 1, 2025 at 2:30 AM CDT, Trevor Gross wrote:
> Thanks for taking a look so fast LH and Jonathan.
>
> On Sun Aug 31, 2025 at 11:09 PM CDT, LIU Hao wrote:
>> 在 2025-9-1 11:12, Jonathan Yong 写道:
>>> On 8/31/25 9:58 PM, Trevor Gross wrote:
>>>> For MinGW on x86-64, GCC currently passes and returns `_Float16` in
>>>> GPRs. Microsoft does not specify an official ABI for this type, but the
>>>> Windows x86-64 calling convention [1] does state the following:
>>>>
>>>>      Any floating-point and double-precision arguments in the first four
>>>>      parameters are passed in XMM0 - XMM3, depending on position.
>>>>      Floating-point values are only placed in the integer registers RCX,
>>>>      RDX, R8, and R9 when there are varargs arguments. For details, see
>>>>      Varargs. Similarly, the XMM0 - XMM3 registers are ignored when the
>>>>      corresponding argument is an integer or pointer type.
>>
>> Technically speaking, there's no 16-bit floating-point type in MSVC; they 
>> use `unsigned short` in DX 
>> headers. I think this paragraph really applies to only `float`, `double` and 
>> their `long double` (same as 
>> `double`).
>
> Agreed; to clarify a bit, my point is mostly that if Microsoft does
> eventually specify an ABI for `_Float16`, the above snip would
> _probably_ be adjusted to apply. Or at least, there isn't any reason
> mentioned there that would exclude f16 from using the same ABI (unlike
> f128 due to the shadow).
>
> But of course the actual ABI is an extention so completely open-ended.
>
>>>> And
>>>>
>>>>      A scalar return value that can fit into 64 bits, including the __m64
>>>>      type, is returned through RAX. Nonscalar types including floats,
>>>>      doubles, and vector types such as __m128, __m128i, __m128d are
>>>>      returned in XMM0. The state of unused bits in the value returned in
>>>>      RAX or XMM0 is undefined.
>>
>> It looks like they have expelled floating-point types from scalar types. 
>> What a shame.
>
> Didn't quite understand this either, glad I wasn't the only one.
>
>>>> Some reading between the lines is necessary, but it seems reasonable to
>>>> expect that `_Float16` should be passed in xmm registers and returned in
>>>> xmm0. This is the same as `float` and `double` and matches the behavior
>>>> that Clang currently has for `_Float16` on its x64 MSVC and MinGW
>>>> targets. (SystemV does the same).
>>>>
>>>> Thus, update the `HFmode` ABI to both pass and return in vector
>>>> registers.
>>>>
>>> 
>>> LH, any feedback?
>>
>> I think the idea about this patch is correct; but it's not because of 
>> interpretation of the MS doc, which 
>> specifies nothing about half floats. `_Float16` should be considered a GNU 
>> extension, and the fact that 
>> it is passed in an integer register on x86-64 does not match x86-32 and 
>> Clang 
>> (https://godbolt.org/z/3sMcPPjGd), and should be a mistake.
>
> As above, the doc snips are just to show some precedent rather than
> saying they necessarily apply to _Float16. But I think we are on the
> same page here. I also was not aware of the inconsistency on 32-bit.
>
> Would you like me to resend with a more clear commit message, or, if you
> are happy with the change, is it fine as-is? (If no resend is required,
> Jonathan caught a typo in a comment that could probably be fixed by the
> committer).
>
>> Grepping for `_Float16` in headers in my MSYS2 installation discovers only 
>> matches in libstdc++ (because 
>> of C++23) and Qt6. Windows code that is meant to be compileable with MSVC 
>> should not use this type 
>> anyway, so even though this is an ABI-breaking change, we should not be too 
>> late for it.
>
> For a bit of context, this came up when adding cross-platform f16
> support in rust and realizing we couldn't use symbols from the system
> libgcc. I'm not really aware of many applications using `_Float16`
> directly on this platform.
>
> - Trevor

Just following up here, how would you like me to proceed LH?

Related question, do you have any thoughts for the f128 return ABI?
Currently it is both passed (effectively required) and returned on the
stack. However, i128 is returned in xmm0, so it would be reasonable for
f128 to be treated the same. This is what Clang does for both f128 and
i128 as of recently, pass on the stack and return in xmm0.

I was planning to submit a patch to return f128 in xmm0, but do you have
any feedback before I do so?

For reference: https://gcc.godbolt.org/z/W6PbTKWdv

- Trevor

Reply via email to