epilk wrote: Hello, apologies for jumping in here so late, but this commit is changing the device function ABI on AMDGPU. For instance, for a device function returning this struct:
``` struct Empty {}; struct RetTy { int f1[3]; Empty e; int f2; }; __device__ RetTy deviceFn() { ... } ``` Clang used to emit the following IR: ``` %struct.Empty = type { i8 } %struct.RetTy = type { [3 x i32], %struct.Empty, i32 } declare %struct.RetTy @deviceFn() ;; AMDGPU backend will return RetTy in 5 vpgr32 registers ``` And now is emitting this: ``` %struct.RetTy = type { [3 x i32], [4 x i8], i32 } declare %struct.RetTy @deviceFn() ;; AMDGPU backend will return RetTy in 8 vgpr32 registers ``` It seems to me like this changes makes it more difficult to implement an ABI that are does a decomposition like this. I guess I can see a few potential paths forward here: 1) revert this patch to get the old behaviour, 2) rework it such that the number of padding bytes is minimized (e.g. a single i8 instead of [4 x i8]), or 3) add a custom clang CodeGen lowering for AMDGPU device functions that can strip away these extra padding bytes for return types. Do you have any thoughts on this? Is the AMDGPU backend relying on a misguided assumption about what the IR type for a struct will look like? https://github.com/llvm/llvm-project/pull/96422 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits