(I indented to post the message below as a reply to the recent "Is ORC 
considered production ready" thread, but it grew much longer than I expected 
and I think it might deserve a separate thread. Note that I often consider 
myself not smart enough to write proper multi-threaded code, so the text below 
might contain wrong or biased information)

Some time ago I have spent several weeks pushing multi threaded Nim 2.0 with 
shared data to the limits, mostly looking at ARC only.

My adventure started with this post where I asked a similar question: What is 
the state of threading in Nim 2.0, and how to make the best us of this: 
<https://forum.nim-lang.org/t/9617>

My plan was to create a minimal actor based system in which I can write multi 
threaded programs without ever having to think about the threading; sharing 
data over threads typically requires the user to take care of the 
synchronization primitives, which I usually find too cumbersome and error prone 
for daily work. My end goal was to be something like Erlang's "Process": a very 
lightweight 'fiber' like flow of control, allowing millions of those to be 
running on a handful of threads, with all communication being done with message 
passing through the processes mailboxes. The processes are built on top of 
disrupteks fine CPS library, which offers processes at the cost of tens of 
bytes of memory each, with very low scheduling overhead.

This where the shared memory+ARC comes in: for effective message passing of 
"large" data, you typically want to avoid deep copies, and _move_ the data from 
thread to thread, effectively. This requires a few steps to happen in the 
proper order:

  * The sending thread needs to make sure it has zero other references to the 
object it is sending out (including any other objects that are referenced by 
that object itself: recursively, potentially cyclic!), because you do not want 
to have the same data referenced by two threads
  * ARC needs to let go of the object and promise to never again touch the RC 
from the sending thread
  * Proper synchronization needs to take place to make sure the RC and the 
moved data are safely handed off to the receiving thread - this also needs to 
work properly on architectures with weaker memory ordering of course.
  * The receiving thread now takes ownership of the RC.



The Actors project is more or less complete, and is in a works-for-me state, 
but I must admit that I have not actually _used_ it for very much after I got 
it to work; For those interested, take a peek at 
<https://github.com/zevv/actors>

A short summary of my conclusions, not complete and in random order:

  * It _can_ be done, but it is fragile, error-prone and feels like walking a 
mine field. Sharing unmanaged data (aka, raw pointers) is of course no problem 
as long as you take care of proper synchronization, but playing well with ARC 
managed data (refs) make things a lot harder; I still have bad nights dreaming 
of backtraces filled with calls to `nimDecRefIsLast()` and `__eqdestroy__XXX()`
  * ARC will increase and decrease reference counters just about everywhere you 
access or mutate a ref, and it is not easy to make this play well with 
synchronization primitives because ARC is usually doing its work "outside" of 
your code (eg, at the end of a function after your last line of Nim)
  * One of the unsolved problems is proper isolation; To safely pass data 
between threads I had to write some nasty code that recursiely peeks at the RC 
headers before Nim refs to inspect the RC counter value, and only allow moving 
data when the RC is 0. There is no way to effectively assure this at compile 
time for generic ARC managed data (`isolate[T]` is cumbersome), so this also 
requires error handling to do the right thing when moved data happens to be 
_not_ isolated.
  * I'm not sure if this can be properly made to play well with ORC, for now. 
The problem is that ORC manages some if its data in thread local variables, 
which makes it not possible to safely move it to another thread. It seems that 
Nim needs some additional infrastructure for this to "reroot" an ORC managed 
ref when moving. `GC_RunORC()` can be used as a workaround to make ORC clean up 
before moving data around, but it comes at a steep price, performance-wise.



My single most important takeway of this little adventure is: this problem is 
still hard, and Nim will not hold your hand - it will happily shoot you in the 
back of your head when you are not looking. Getting a SIGSEGV right away is 
usually the best result you can hope for, because these are obvious and 
traceable. The problem is of course that a lot of bugs of this class can be 
very, very subtle and can show up in a million different ways, not causing 
crashes but all kinds of other undefined behavior. Not something I want in my 
production code.

If you decide to go play with shared ARC managed memory, do yourself a huge 
favour and **use and trust memory sanitizers like asan/tsan and 
Valgrind/Helgrind/Drd** and take the output very serious. I have talked to some 
people telling me that they knew what they were doing and that Valgrind was 
just generating false positives. I beg to differ: Valgrind has been right 99% 
of the time. If Valgrind ever generates a false positive, something in your 
code is usually doing "funny stuff" and IMHO deserves proper annotation to make 
it shut up, and inform readers of the code that funny stuff is happening here.

My final conclusion would be that ARC simply does not play well with threading 
in the current state unless you really, really know what you are doing. Having 
atomic RC types in the language would take most of these headaches away.

Reply via email to