felipecrv commented on issue #40646: URL: https://github.com/apache/arrow/issues/40646#issuecomment-2018330903
Unless we use an ordering that make stats incorrect (read and increment without caring about what other cores have done), just changing the ordering won't help with the contention — updating a value atomically, requires a trip across the memory bus. I re-ordered the loads and stores in a way that helps mask the latency in #40647, but a more complete solution involves pushing the cost to the load side: we can use multiple counters that can be updated locally (local = CPU core) and when we have to load a stat for reporting we ask for the values from the multiple locations and we sum them. This link explains the technique (but for a different problem): https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html In distributed systems, this idea is called "Sharded Counters". The principle is the same: distribute the increments and pay a latency cost when reading the value. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
