epilk wrote:

Hello, apologies for jumping in here so late, but this commit is changing the 
device function ABI on AMDGPU. For instance, for a device function returning 
this struct:

```
struct Empty {};
struct RetTy {
  int f1[3];
  Empty e;
  int f2;
};

__device__ RetTy deviceFn() { ... }
```

Clang used to emit the following IR:

```
%struct.Empty = type { i8 }
%struct.RetTy = type { [3 x i32], %struct.Empty, i32 }
declare %struct.RetTy @deviceFn() ;; AMDGPU backend will return RetTy in 5 
vpgr32 registers
```

And now is emitting this:

```
%struct.RetTy = type { [3 x i32], [4 x i8], i32 }
declare %struct.RetTy @deviceFn() ;; AMDGPU backend will return RetTy in 8 
vgpr32 registers
```

It seems to me like this changes makes it more difficult to implement an ABI 
that are does a decomposition like this. I guess I can see a few potential 
paths forward here: 1) revert this patch to get the old behaviour, 2) rework it 
such that the number of padding bytes is minimized (e.g. a single i8 instead of 
[4 x i8]), or 3) add a custom clang CodeGen lowering for AMDGPU device 
functions that can strip away these extra padding bytes for return types. Do 
you have any thoughts on this? Is the AMDGPU backend relying on a misguided 
assumption about what the IR type for a struct will look like?

https://github.com/llvm/llvm-project/pull/96422
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to