ei-grad commented on pull request #12460: URL: https://github.com/apache/arrow/pull/12460#issuecomment-1076369235
Modern amd64 CPUs can compute cumsum a least 30% faster when using non-naive algorithm like [this one](https://en.wikipedia.org/wiki/Prefix_sum#Algorithm_2:_Work-efficient) (this doesn't do anything with parallelism, just the `ADD` instruction latency in case of dependency on the previous `ADD` instruction output), or even faster when relying on SIMD/intrinsics. Here is one other reference with some numbers - https://github.com/joelangeway/CumulativeSum. Would it be possible to implement such optimizations in arrow? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
