jhorstmann commented on issue #1400: URL: https://github.com/apache/arrow-rs/issues/1400#issuecomment-1059743980
Interesting, I'm actually seeing the opposite effect, with `fold` being faster. This is running on an Amd 3700U laptop. The generated code for `fold` and `reduce` seems to differ in that `reduce` contains several branches while `fold` looks relatively branchless. Different CPUs might handle one or the other better, although I would expect the branchless to win in the benchmark since the input consists of random numbers. I don't have any good explanation why this effects the null handling loop. ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
