Don't update concurrently the same data structure that will create a huge contention bottleneck due to the heavy synchronization and our program might become slower than single-threaded due to cache thrashing.
Assuming your algorithm is tree-like (I'm familiar with go bot but not chess bot) ideally you have a tree-datastructure and: * Either you launch a thread on separate branches and keep the synchronization restricted to 2 threads (sub-branches) a.k.a. branch-parallelism * Or you duplicate your data structure per-thread to avoid paying synchronisation cost a.k.a. tree-parallelism If you want to know how big the synchronization cost can be my fibonacci benchmark of GCC implementation of OpenMP versus LLVM shows a factor 100..000x ( [https://github.com/mratsim/weave/tree/master/benchmarks/fibonacci](https://github.com/mratsim/weave/tree/master/benchmarks/fibonacci) ). Today you can use my experimental weave code that gives you async like semantics and very efficient multithreading, for usage see: * [https://github.com/mratsim/weave/blob/master/e04_channel_based_work_stealing/async_internal.nim#L138-L180](https://github.com/mratsim/weave/blob/master/e04_channel_based_work_stealing/async_internal.nim#L138-L180) * [https://github.com/mratsim/weave/blob/master/e04_channel_based_work_stealing/async_for_internal.nim#L129-L151](https://github.com/mratsim/weave/blob/master/e04_channel_based_work_stealing/async_for_internal.nim#L129-L151) It's experimental code nonetheless it's working and backed by 3 years of PhD research. I suggest you submodule the library. In the future I hope to make it a proper high-level library, see [Project Picasso RFC](https://forum.nim-lang.org/t/5083). Alternatively you can use Nim threadpools but they suffer from the same issue as GCC OpenMP: * it uses a single global queue that enqueues/dequeues all tasks * consequently it cannot do load balancing (work-stealing) * it chokes if tasks are small (say 1ms/task) and the queue datastructure becomes the contention point.
