henrybw wrote:
> I don't think this patch really gets to the root of the problem.
>
> Really, the issue is this `return ABIArgInfo::getDirect();` at the end of
> classifyRegCallStructTypeImpl(). It's saying "use the LLVM IR type of the
> struct"... but we have no idea what the type will actually look like;
> conversion from C struct types to LLVM IR types depends on a bunch of
> arbitrary and unstable choices in CodeGenTypes. I think the code needs to be
> rewritten to get rid of this: it needs to explicitly construct the
> appropriate return type, like X86_64ABIInfo::classifyArgumentType does.
@efriedma-quic I agree, thanks for suggesting that.
I've been rewriting `classifyRegCallStructTypeImpl` accordingly; however, in
doing so, I found a couple of bugs that interact with each other in a way that
causes the `ret_reg_reused` test to accidentally pass:
```c
struct HFA6 { __m128 f[4]; };
struct HFA6 __regcall ret_reg_reused(struct HFA6 a, struct HFA6 b, struct HFA6
c, struct HFA6 d){ struct HFA6 h; return h;}
// X86: define dso_local x86_regcallcc %struct.HFA6
@__regcall3__ret_reg_reused(<4 x float> %a.0, <4 x float> %a.1, <4 x float>
%a.2, <4 x float> %a.3, <4 x float> %b.0, <4 x float> %b.1, <4 x float> %b.2,
<4 x float> %b.3, ptr inreg noundef dead_on_return %c, ptr inreg noundef
dead_on_return %d)
// Win64: define dso_local x86_regcallcc %struct.HFA6
@__regcall3__ret_reg_reused(<4 x float> %a.0, <4 x float> %a.1, <4 x float>
%a.2, <4 x float> %a.3, <4 x float> %b.0, <4 x float> %b.1, <4 x float> %b.2,
<4 x float> %b.3, <4 x float> %c.0, <4 x float> %c.1, <4 x float> %c.2, <4 x
float> %c.3, <4 x float> %d.0, <4 x float> %d.1, <4 x float> %d.2, <4 x float>
%d.3)
// Lin64: define dso_local x86_regcallcc %struct.HFA6
@__regcall3__ret_reg_reused([4 x <4 x float>] %a.coerce, [4 x <4 x float>]
%b.coerce, [4 x <4 x float>] %c.coerce, [4 x <4 x float>] %d.coerce)
```
1. Currently, `classifyRegCallStructTypeImpl` counts registers needed by
passing each field individually to `classifyArgumentType`. However, `classify`
returns NoClass for the `__m128 f[4]` array type on its own. Normally, when
`classify` sees this array as a field inside a struct, it falls back to Memory.
But since `classifyRegCallStructTypeImpl` passes the array type itself, the
array field gets incorrectly classified as Ignore and assigned 0 registers.
2. For regcall specifically, `X86_64ABIInfo::computeInfo` doesn't reset
`FreeSSERegs` between classifying the return type and argument types (on
Windows, `WinX86_64ABIInfo::computeInfo` resets `FreeSSERegs` to 16 before
classifying regcall arguments). Since array fields "need" 0 SSE registers, all
these struct arguments accidentally "fit" in SSE registers.
By fixing array field classification, the return type and each argument type
now correctly require 4 SSE registers each. However, since
`X86_64ABIInfo::computeInfo` deducts 4 SSE registers for the return type
without resetting, the argument types only have 12 free SSE registers left,
causing `struct HFA6 d` to spill to the stack.
This actually shows up if you replace the 4-element array with 4 fields. The
generated code should be the same; however, it's not (the last 4-field struct
argument spills and becomes `byval`): https://godbolt.org/z/sxn61eaWW
Resetting `FreeSSERegs = 16` for regcalls after return type classification in
`X86_64ABIInfo::computeInfo` (matching `WinX86_64ABIInfo::computeInfo`) causes
all four arguments to fit in SSE registers for both structs, and the CodeGen
and CodeGenCXX tests still pass. However, this would be a breaking ABI change
for regcall on SysV x86-64.
---
This is starting to grow the scope of the original patch. Should I include
these fixes in this patch? Or would you prefer that I split those out into a
follow-on patch to review separately? (I can also file bugs for the above
issues as needed.)
One other thing: this refactor changes some types in the IR (e.g.
`%struct.HFA6` becomes `{ [4 x <4 x float>] }`, `%struct.HFA2` becomes `{
double, double }`). These are the explicit coerce-to types replacing the
implicit types from the previous CodeGenTypes conversion. The generated machine
code winds up being the same, but the IR signatures change: is this
expected/acceptable, or should the coerce-to types still be matching the old IR
types where possible?
https://github.com/llvm/llvm-project/pull/187134
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits