WillAyd commented on issue #40874:
URL: https://github.com/apache/arrow/issues/40874#issuecomment-2150431309
Tried running smaller examples through godbolt with gcc 14.1 to see if they
produced any instruction differences. I think this replicates the NumPy example:
```c
void foo() {
char *src;
char *dst;
while (1) {
*(double *)dst = ((double)(*(long *)src));
dst += sizeof(double);
src += sizeof(long);
}
}
```
producing the following asm:
```asm
foo:
push rbp
mov rbp, rsp
.L2:
mov rax, QWORD PTR [rbp-8]
mov rax, QWORD PTR [rax]
pxor xmm0, xmm0
cvtsi2sd xmm0, rax
mov rax, QWORD PTR [rbp-16]
movsd QWORD PTR [rax], xmm0
add QWORD PTR [rbp-16], 8
add QWORD PTR [rbp-8], 8
jmp .L2
```
and this typifies the C++ example:
```cpp
void foo() {
using OutT = double;
using InT = long;
const InT* in_values;
OutT* out_values;
for (;;) {
*out_values++ = static_cast<OutT>(*in_values++);
}
}
```
producing the following asm:
```asm
foo():
push rbp
mov rbp, rsp
.L2:
mov rax, QWORD PTR [rbp-8]
lea rdx, [rax+8]
mov QWORD PTR [rbp-8], rdx
mov rax, QWORD PTR [rax]
pxor xmm0, xmm0
cvtsi2sd xmm0, rax
mov rax, QWORD PTR [rbp-16]
lea rdx, [rax+8]
mov QWORD PTR [rbp-16], rdx
movsd QWORD PTR [rax], xmm0
jmp .L2
```
Maybe the processor just handles the add instructions more effectively than
the lea / mov combination in the C++ example?
FWIW if you change the C++ code to do the pointer increment in subsequent
expressions rather than doing inline, the asm generated matches the C examples:
```cpp
void foo() {
using OutT = double;
using InT = long;
const InT* in_values;
OutT* out_values;
for (;;) {
*out_values = static_cast<OutT>(*in_values);
out_values++;
in_values++;
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]