ei-grad commented on pull request #12460:
URL: https://github.com/apache/arrow/pull/12460#issuecomment-1076369235


   Modern amd64 CPUs can compute cumsum a least 30% faster when using non-naive 
algorithm like [this 
one](https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_2:_Work-efficient) 
(this doesn't do anything with parallelism, just the `ADD` instruction latency 
in case of dependency on the previous `ADD` instruction output), or even faster 
when relying on SIMD/intrinsics. Here is one other reference with some numbers 
- https://github.com/joelangeway/CumulativeSum.
   
   Would it be possible to implement such optimizations in arrow?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to