| Issue |
61592
|
| Summary |
Question: How does clang assume alignment for extern C arrays?
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
knzivid
|
I am trying to investigate why I hit a segmentation fault in C++. I am pretty certain this is a bug in my code. But I wanted to know why clang behaves differently compared to GCC here. I am a beginner to reading assembly, please correct me if I made any false assumptions :)
Consider the snippet (https://www.godbolt.org/z/PKroaEKoo)
```cpp
#include <cstdint>
#include <string>
using T = unsigned char;
template<size_t n>
int asString(const T (&t)[n])
{
return std::string(t, t + n).at(0);
}
extern T data[14717];
int main()
{
return asString(data);
}
```
Here, `data` comes from an object file generated through a binutils derivative. The alignment of data is not explicitly specified.
```
$ readelf --sections --symbols data.o --wide
There are 5 section headers, starting at offset 0x3a70:
Section Headers:
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .data PROGBITS 0000000000000000 000040 003981 00 WA 0 0 1
[ 2] .symtab SYMTAB 0000000000000000 0039c8 000048 18 3 1 8
[ 3] .strtab STRTAB 0000000000000000 003a10 00003d 00 0 0 1
[ 4] .shstrtab STRTAB 0000000000000000 003a4d 000021 00 0 0 1
```
When I look at the generated assembly, this is unrolled into movups. In the internal binary I am investigating, I do see a mix of movaps and movups where the string constructor is called. The extern C array is copied using movaps here to xmm.
```
0x00215090 movaps xmm0, xmmword [rcx + rsi - 0x50]
0x00215095 movaps xmm1, xmmword [rcx + rsi - 0x40]
0x0021509a movups xmmword [rax + rsi - 0x50], xmm0
0x0021509f movups xmmword [rax + rsi - 0x40], xmm1
0x002150a4 movaps xmm0, xmmword [rcx + rsi - 0x30]
0x002150a9 movaps xmm1, xmmword [rcx + rsi - 0x20]
0x002150ae movups xmmword [rax + rsi - 0x30], xmm0
0x002150b3 movups xmmword [rax + rsi - 0x20], xmm1
0x002150b8 movaps xmm0, xmmword [rcx + rsi - 0x10]
0x002150bd movaps xmm1, xmmword [rcx + rsi]
0x002150c1 movups xmmword [rax + rsi - 0x10], xmm0
0x002150c6 movups xmmword [rax + rsi], xmm1
0x002150ca add rsi, 0x60
0x002150ce cmp rsi, 0x39b0 ; case.0x553b30.2 ; case.0x553b30.2 ; case.0x553b30.2
0x002150d5 jne 0x215090 ; likely
```
**Question 1**: How does clang know to choose movaps vs movups? According to [this SO answer](https://stackoverflow.com/a/61197816), "the x86-64 System V ABI requires that static arrays of 16 bytes or larger be aligned by 16". Using [movaps](https://www.felixcloutier.com/x86/movaps) with an XMM register also requires 16 byte alignment. Are these the only constraints?
**Question 2**: Even though the minimal example uses movups, my internal binary uses a mix of movups and movaps. Do you have any suggestions on how to go about debugging this? Both my internal binary and godbolt are essentially the same code.
**Question 3**: In the godbolt example, changing `using T = char` defers to a memcpy. Does this mean marking the extern C array as `unsigned char` produce better code or is this influenced by something else? I understand "better" here is subjective, and I want to understand the reasons behind the difference in generated code.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs