@elcritch Thanks a lot ! Yes I think fewer allocations would be made than async, but the allocations are more costly for stackful, so I don't know if there could be a performance gain. The dispatcher/event loop is the main source of CPU cost, so I don't think one will be largely faster than the other. (I have one benchmark where NimGo can handle producer/consumer on two pipes 30% faster, and one where it handles 200 sockets 500% slower than std/asyncdispatch, certainly an effiency on my dispatcher). There is also the possibility to reuse the existing coroutines, but it has some downsides with few speed gains, so I will drop that support.
Concerning Golang, I couldn't find many informations, but what I understand is that they combine both. They use two threads and distribute the coroutines in those threads. That is handle automatically, so when you do `go`, it might be running in another thread (or not). I tried that approach, but that was honestly too memory unsafe and much more complicated. Single threaded environment is simpler to use, to implement, to reason about. Because when you know that your code is the only one running, you don't have to be worried a lot about synchronizations and data races aren't as common. For debugging, it might be so simple. I managed to provide two different stacktraces : * the stacktrace of where the coroutine was created in non release mode (essential for debugging) * the stacktrace for where the error occurs, but because every coroutines are started and resumed by the dispatcher (like std/asyncdispatch) and are a closure (like std/asyncdispatch), it might not always be useful, however because the user shall use `goAsync` less often than `await`, the stacktrace should be deeper, so easier to find where the error occurs. But I agree GDB is a pain to debug, I generally ends up with a ton of "echo" in my code, hoperfully that's not very common thanks to nim's strong type safety.