felipecrv commented on issue #40646:
URL: https://github.com/apache/arrow/issues/40646#issuecomment-2018330903

   Unless we use an ordering that make stats incorrect (read and increment 
without caring about what other cores have done), just changing the ordering 
won't help with the contention — updating a value atomically, requires a trip 
across the memory bus.
   
   I re-ordered the loads and stores in a way that helps mask the latency in 
#40647, but a more complete solution involves pushing the cost to the load 
side: we can use multiple counters that can be updated locally (local = CPU 
core) and when we have to load a stat for reporting we ask for the values from 
the multiple locations and we sum them.
   
   This link explains the technique (but for a different problem): 
https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html  In 
distributed systems, this idea is called "Sharded Counters". The principle is 
the same: distribute the increments and pay a latency cost when reading the 
value.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to