jorisvandenbossche commented on issue #40874:
URL: https://github.com/apache/arrow/issues/40874#issuecomment-2025511001

   Looking into the inner loop in both Arrow and numpy, it seems to be quite 
similar. In numpy, almost all time is spent in 
`_aligned_contig_cast_long_to_double`, which essentially boils down to:
   
   ```c
       npy_intp N = dimensions[0];
       char *src = args[0], *dst = args[1];
   
       while (N--) {
           *(npy_double *)dst = ((npy_double)(*(npy_long *)src));
           dst += sizeof(npy_double);
           src += sizeof(npy_long);
       }
   ```
   
   and in Arrow, almost all time is spent in `CastPrimitive`, which essentially 
does:
   
   
https://github.com/apache/arrow/blob/cf832b8b5dd91ca1b70519fa544f0a44ebdb3bce/cpp/src/arrow/compute/kernels/scalar_cast_internal.cc#L40-L46
   
   Anybody any insight in why our templated C++ code is so much slower than 
numpy's C code? Logically it looks very similar, or would there be something in 
our code that prevents optimizations?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to