| Issue |
64706
|
| Summary |
ABI for `__m256` and `__m512` is wrong when `avx`/`avx512` is disabled globally, or enabled per-function
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
chorman0773
|
Based on this discussion: https://groups.google.com/g/x86-64-abi/c/FMhl2vDl1D8
Currently, llvm passes `__m256` and `__m512` parameters/return values when it cannot use ymm/zmm registers as follows:
* Parameters are passed on the stack
* Return values are spanned accross 2-4 `xmm` registers.
Further, when the avx/avx512f features are enabled at the function level (not globally, using `__attribute__((target))`), it passes parameters/return values:
* Paramaters are passed on the stack
* Return values are placed in a single `ymm`/`zmm` register.
In contrast the behaviour of gcc (which is apparantly the correct behaviour in both cases) is:
When ymm/zmm registers are unavailable:
* Parameters are passed on the stack
* Return values in memory (return pointer in rdi)
When ymm/zmm registers are available at the function level (using `__attribute__((target))`), it passes and returns values as it does when the feature is available globally via a `-m` flag.
The difference in behaviour can be demonstrated by https://godbolt.org/z/8sYcn6654.
Based on a short discussion on the x86-64 psABI mailing list, this appears to be entirely incorrect on behalf of llvm: When returning w/o the registers available, it must return in memory as the ABI requires it to place the 2nd SSEUP eightbyte in the 3rd eightbyte of `xmm0`, which fails, and sends the entire value to memory. In the locally-enabled case, the registers are available, so it should be passing fully in `ymm1` and returning fully in `ymm0` (llvm seems to think that it is available given that it does return in `ymm0`).
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs