(I indented to post the message below as a reply to the recent "Is ORC considered production ready" thread, but it grew much longer than I expected and I think it might deserve a separate thread. Note that I often consider myself not smart enough to write proper multi-threaded code, so the text below might contain wrong or biased information)
Some time ago I have spent several weeks pushing multi threaded Nim 2.0 with shared data to the limits, mostly looking at ARC only. My adventure started with this post where I asked a similar question: What is the state of threading in Nim 2.0, and how to make the best us of this: <https://forum.nim-lang.org/t/9617> My plan was to create a minimal actor based system in which I can write multi threaded programs without ever having to think about the threading; sharing data over threads typically requires the user to take care of the synchronization primitives, which I usually find too cumbersome and error prone for daily work. My end goal was to be something like Erlang's "Process": a very lightweight 'fiber' like flow of control, allowing millions of those to be running on a handful of threads, with all communication being done with message passing through the processes mailboxes. The processes are built on top of disrupteks fine CPS library, which offers processes at the cost of tens of bytes of memory each, with very low scheduling overhead. This where the shared memory+ARC comes in: for effective message passing of "large" data, you typically want to avoid deep copies, and _move_ the data from thread to thread, effectively. This requires a few steps to happen in the proper order: * The sending thread needs to make sure it has zero other references to the object it is sending out (including any other objects that are referenced by that object itself: recursively, potentially cyclic!), because you do not want to have the same data referenced by two threads * ARC needs to let go of the object and promise to never again touch the RC from the sending thread * Proper synchronization needs to take place to make sure the RC and the moved data are safely handed off to the receiving thread - this also needs to work properly on architectures with weaker memory ordering of course. * The receiving thread now takes ownership of the RC. The Actors project is more or less complete, and is in a works-for-me state, but I must admit that I have not actually _used_ it for very much after I got it to work; For those interested, take a peek at <https://github.com/zevv/actors> A short summary of my conclusions, not complete and in random order: * It _can_ be done, but it is fragile, error-prone and feels like walking a mine field. Sharing unmanaged data (aka, raw pointers) is of course no problem as long as you take care of proper synchronization, but playing well with ARC managed data (refs) make things a lot harder; I still have bad nights dreaming of backtraces filled with calls to `nimDecRefIsLast()` and `__eqdestroy__XXX()` * ARC will increase and decrease reference counters just about everywhere you access or mutate a ref, and it is not easy to make this play well with synchronization primitives because ARC is usually doing its work "outside" of your code (eg, at the end of a function after your last line of Nim) * One of the unsolved problems is proper isolation; To safely pass data between threads I had to write some nasty code that recursiely peeks at the RC headers before Nim refs to inspect the RC counter value, and only allow moving data when the RC is 0. There is no way to effectively assure this at compile time for generic ARC managed data (`isolate[T]` is cumbersome), so this also requires error handling to do the right thing when moved data happens to be _not_ isolated. * I'm not sure if this can be properly made to play well with ORC, for now. The problem is that ORC manages some if its data in thread local variables, which makes it not possible to safely move it to another thread. It seems that Nim needs some additional infrastructure for this to "reroot" an ORC managed ref when moving. `GC_RunORC()` can be used as a workaround to make ORC clean up before moving data around, but it comes at a steep price, performance-wise. My single most important takeway of this little adventure is: this problem is still hard, and Nim will not hold your hand - it will happily shoot you in the back of your head when you are not looking. Getting a SIGSEGV right away is usually the best result you can hope for, because these are obvious and traceable. The problem is of course that a lot of bugs of this class can be very, very subtle and can show up in a million different ways, not causing crashes but all kinds of other undefined behavior. Not something I want in my production code. If you decide to go play with shared ARC managed memory, do yourself a huge favour and **use and trust memory sanitizers like asan/tsan and Valgrind/Helgrind/Drd** and take the output very serious. I have talked to some people telling me that they knew what they were doing and that Valgrind was just generating false positives. I beg to differ: Valgrind has been right 99% of the time. If Valgrind ever generates a false positive, something in your code is usually doing "funny stuff" and IMHO deserves proper annotation to make it shut up, and inform readers of the code that funny stuff is happening here. My final conclusion would be that ARC simply does not play well with threading in the current state unless you really, really know what you are doing. Having atomic RC types in the language would take most of these headaches away.
