Re: [I] [C++] Performance of numeric casts [arrow]

via GitHub Wed, 05 Jun 2024 09:02:42 -0700


WillAyd commented on issue #40874:
URL: https://github.com/apache/arrow/issues/40874#issuecomment-2150431309


   Tried running smaller examples through godbolt with gcc 14.1 to see if they 
produced any instruction differences. I think this replicates the NumPy example:
   
   ```c
   void foo() {
       char *src;
       char *dst;
       while (1) {
           *(double *)dst = ((double)(*(long *)src));
           dst += sizeof(double);
           src += sizeof(long);
       }
   }
   ```
   
   producing the following asm:
   
   ```asm
   foo:
           push    rbp
           mov     rbp, rsp
   .L2:
           mov     rax, QWORD PTR [rbp-8]
           mov     rax, QWORD PTR [rax]
           pxor    xmm0, xmm0
           cvtsi2sd        xmm0, rax
           mov     rax, QWORD PTR [rbp-16]
           movsd   QWORD PTR [rax], xmm0
           add     QWORD PTR [rbp-16], 8
           add     QWORD PTR [rbp-8], 8
           jmp     .L2
   ```
   
   and this typifies the C++ example:
   
   ```cpp
   void foo() {
       using OutT = double; 
       using InT = long; 
       const InT* in_values; 
       OutT* out_values; 
       for (;;) { 
           *out_values++ = static_cast<OutT>(*in_values++); 
       } 
    }
   ```
   
   producing the following asm:
   
   ```asm
   foo():
           push    rbp
           mov     rbp, rsp
   .L2:
           mov     rax, QWORD PTR [rbp-8]
           lea     rdx, [rax+8]
           mov     QWORD PTR [rbp-8], rdx
           mov     rax, QWORD PTR [rax]
           pxor    xmm0, xmm0
           cvtsi2sd        xmm0, rax
           mov     rax, QWORD PTR [rbp-16]
           lea     rdx, [rax+8]
           mov     QWORD PTR [rbp-16], rdx
           movsd   QWORD PTR [rax], xmm0
           jmp     .L2
   ```
   
   Maybe the processor just handles the add instructions more effectively than 
the lea / mov combination in the C++ example?
   
   FWIW if you change the C++ code to do the pointer increment in subsequent 
expressions rather than doing inline, the asm generated matches the C examples:
   
   ```cpp
   void foo() {
       using OutT = double; 
       using InT = long; 
       const InT* in_values; 
       OutT* out_values; 
       for (;;) { 
           *out_values = static_cast<OutT>(*in_values); 
           out_values++;
           in_values++;
       } 
    }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [C++] Performance of numeric casts [arrow]

Reply via email to