You'll have to tell us the algorithm you're trying to implement or context because:
1. If your threads only rarely need concurrent access to the array, you'll be fine, but you might as well use `std/atomics` `fetchAdd` and skip locks. 2. If you have constant concurrent access to the array, the lock will be a bottleneck and your code might be slower than serial code. `fetchAdd` may help but I would expect it would be still slower due to each update invalidating the cache of the other cores leading to "cache thrashing" 3. If threads can access to distinct cells in the array, you can parallelize without locks or atomics and enjoy good speedup.