henrybw wrote:

> I don't think this patch really gets to the root of the problem.
>
> Really, the issue is this `return ABIArgInfo::getDirect();` at the end of 
> classifyRegCallStructTypeImpl(). It's saying "use the LLVM IR type of the 
> struct"... but we have no idea what the type will actually look like; 
> conversion from C struct types to LLVM IR types depends on a bunch of 
> arbitrary and unstable choices in CodeGenTypes. I think the code needs to be 
> rewritten to get rid of this: it needs to explicitly construct the 
> appropriate return type, like X86_64ABIInfo::classifyArgumentType does.

@efriedma-quic I agree, thanks for suggesting that.

I've been rewriting `classifyRegCallStructTypeImpl` accordingly; however, in 
doing so, I found a couple of bugs that interact with each other in a way that 
causes the `ret_reg_reused` test to accidentally pass:

```c
struct HFA6 { __m128 f[4]; };
struct HFA6 __regcall ret_reg_reused(struct HFA6 a, struct HFA6 b, struct HFA6 
c, struct HFA6 d){ struct HFA6 h; return h;}
// X86: define dso_local x86_regcallcc %struct.HFA6 
@__regcall3__ret_reg_reused(<4 x float> %a.0, <4 x float> %a.1, <4 x float> 
%a.2, <4 x float> %a.3, <4 x float> %b.0, <4 x float> %b.1, <4 x float> %b.2, 
<4 x float> %b.3, ptr inreg noundef dead_on_return %c, ptr inreg noundef 
dead_on_return %d)
// Win64: define dso_local x86_regcallcc %struct.HFA6 
@__regcall3__ret_reg_reused(<4 x float> %a.0, <4 x float> %a.1, <4 x float> 
%a.2, <4 x float> %a.3, <4 x float> %b.0, <4 x float> %b.1, <4 x float> %b.2, 
<4 x float> %b.3, <4 x float> %c.0, <4 x float> %c.1, <4 x float> %c.2, <4 x 
float> %c.3, <4 x float> %d.0, <4 x float> %d.1, <4 x float> %d.2, <4 x float> 
%d.3)
// Lin64: define dso_local x86_regcallcc %struct.HFA6 
@__regcall3__ret_reg_reused([4 x <4 x float>] %a.coerce, [4 x <4 x float>] 
%b.coerce, [4 x <4 x float>] %c.coerce, [4 x <4 x float>] %d.coerce)
```

1. Currently, `classifyRegCallStructTypeImpl` counts registers needed by 
passing each field individually to `classifyArgumentType`. However, `classify` 
returns NoClass for the `__m128 f[4]` array type on its own. Normally, when 
`classify` sees this array as a field inside a struct, it falls back to Memory. 
But since `classifyRegCallStructTypeImpl` passes the array type itself, the 
array field gets incorrectly classified as Ignore and assigned 0 registers.

2. For regcall specifically, `X86_64ABIInfo::computeInfo` doesn't reset 
`FreeSSERegs` between classifying the return type and argument types (on 
Windows, `WinX86_64ABIInfo::computeInfo` resets `FreeSSERegs` to 16 before 
classifying regcall arguments). Since array fields "need" 0 SSE registers, all 
these struct arguments accidentally "fit" in SSE registers.

By fixing array field classification, the return type and each argument type 
now correctly require 4 SSE registers each. However, since 
`X86_64ABIInfo::computeInfo` deducts 4 SSE registers for the return type 
without resetting, the argument types only have 12 free SSE registers left, 
causing `struct HFA6 d` to spill to the stack.

This actually shows up if you replace the 4-element array with 4 fields. The 
generated code should be the same; however, it's not (the last 4-field struct 
argument spills and becomes `byval`): https://godbolt.org/z/sxn61eaWW

Resetting `FreeSSERegs = 16` for regcalls after return type classification in 
`X86_64ABIInfo::computeInfo` (matching `WinX86_64ABIInfo::computeInfo`) causes 
all four arguments to fit in SSE registers for both structs, and the CodeGen 
and CodeGenCXX tests still pass. However, this would be a breaking ABI change 
for regcall on SysV x86-64.

---

This is starting to grow the scope of the original patch. Should I include 
these fixes in this patch? Or would you prefer that I split those out into a 
follow-on patch to review separately? (I can also file bugs for the above 
issues as needed.)

One other thing: this refactor changes some types in the IR (e.g. 
`%struct.HFA6` becomes `{ [4 x <4 x float>] }`, `%struct.HFA2` becomes `{ 
double, double }`). These are the explicit coerce-to types replacing the 
implicit types from the previous CodeGenTypes conversion. The generated machine 
code winds up being the same, but the IR signatures change: is this 
expected/acceptable, or should the coerce-to types still be matching the old IR 
types where possible?

https://github.com/llvm/llvm-project/pull/187134
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to