jorisvandenbossche commented on issue #40874:
URL: https://github.com/apache/arrow/issues/40874#issuecomment-2025511001
Looking into the inner loop in both Arrow and numpy, it seems to be quite
similar. In numpy, almost all time is spent in
`_aligned_contig_cast_long_to_double`, which essentially boils down to:
```c
npy_intp N = dimensions[0];
char *src = args[0], *dst = args[1];
while (N--) {
*(npy_double *)dst = ((npy_double)(*(npy_long *)src));
dst += sizeof(npy_double);
src += sizeof(npy_long);
}
```
and in Arrow, almost all time is spent in `CastPrimitive`, which essentially
does:
https://github.com/apache/arrow/blob/cf832b8b5dd91ca1b70519fa544f0a44ebdb3bce/cpp/src/arrow/compute/kernels/scalar_cast_internal.cc#L40-L46
Anybody any insight in why our templated C++ code is so much slower than
numpy's C code? Logically it looks very similar, or would there be something in
our code that prevents optimizations?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]