@Ahogani > Well you can have both in the same program but I have never seen a > cooperative scheduling system to produce parallelism.
What I mean is that you can use stackful coroutines/fibers AND a multithreading runtime to have parallelism. Now there seem to be a misalignment at what "cooperative" mean. I use cooperative as in you yield explicitly. Given that only the OS or with the OS cooperation (with signals) can preempt execution, most scheduler are cooperative. > Of course, stackless coroutines is more efficient on both memory usage and > cpu cost, but only stackful coroutines allows to suspend and resume execution > in arbitrary depthness if I am not wrong. You're not wrong but this is not a problem, I mention it there: <https://github.com/mratsim/weave-io-research/blob/master/design/design_1_coroutines.md> > Coroutines come in flavors: > asymmetric vs symmetric. > An asymmetric > coroutine can only hand control back to its current owner (caller or > resumer). They have asymmtric power. A symmetric can switch to any other > coroutine and not only its caller/resumer. Asymmetric and symmetric can be > implemented in terms of each other (with overhead). The difference is in the > ergonomics we want by default. Unless you're doing manual scheduling lua-style, the coroutines are usually managed by a scheduler. And that scheduler can simulate jumping to anywhere in the stack. And in practice, jumping anywhere in the stack breaks debugging and exceptions. @elcritch > One thing that always seemed better about stackful coroutines vs async > futures would be much fewer allocations. Every async call that has any state > incurs allocation overhead. That's not a fundamental problem but more of an engineering problem. Rust tried hard to have zero-cost futures with no allocation. But their futures are incompatible with the completion-model of IO from io-uring and Windows IOCP (IO Completion port) because: 1. Rust futures are aligned with a readiness-base model where the OS signals data is ready, Rust fetches that data, copying from kernel buffer to user-space buffer. 2. To enable OS zero-copy, the completion model requires a buffer from the user, which introduces life-time issues, that must be solved by heap allocation. 3. The OS copies data directly into that buffer and signals completion. Hence, the best futures can do are single-allocation in the general case. Constantine's threadpool and the future Weave-Io have this: <https://github.com/mratsim/weave-io/blob/8672f1c/weave_io/crossthread/tasks_flowvars.nim#L14-L52> Everything is intrusive: * The task has an intrusive linked list field that can be used in MPSC task queues * The ownership state is intrusive to synchronize handover between producers and consumers and deallocating the task. Some tasks have no result and so no consumers so producers must dealloc them, and others have a consumer) * The result of the future is intrusive And in the optimal case, if we have guarantee that a task cannot escape, i.e. a task and its descendants are finished before we exit a scope (also called structured parallelism), we can do zero-alloc with the compiler help by allocating the task on the stack. An optimization often called "heap-allocation elision" see coroutine heap allocation elision: <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0981r0.html>. Note: this is not possible with fibers/stackful coroutines due to them mmap-ing their own stack, making them unsuitable for implementing iterators / higher-order functions like <https://godbolt.org/g/26viuZ> @Alogani > I have been working hard to address some issues in my NimGo library. I have > implemented custom stacktrace management in debug mode to allow for better > debugging: In case of a raise, it is possible to see in a nested way where > the errors occurs and who were the caller : > > The author of paper you gave is the creator of the implementation of > stackless coroutines in c++, so it is logical he thinks his model is better. > However, the cost of function coloring is IMHO just too high for non > performance-sensitive usage. I have also found this paper of one of his pear > that disagrees with him, you might find it interesting : > <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0866r0.pdf> I'm aware of the paper. My main issues with fibers in Nim is the handling of exceptions, I'm not sure how it works with `try / except` and `try / finally`. Do note that all languages that tried fibers: Java 0.1, Go and Rust abandoned them also because of segmented stacks. * Rust: <https://web.archive.org/web/20131107015411/https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html> * Go: <https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub> In any case, it's good that you made headway regarding exceptions, iirc <https://nim-lang.org/docs/coro.html> was not compatible with exceptions. * * * You might also want to read through my research, design and implementation ideas that I stored into <https://github.com/mratsim/weave-io-research> While go does not have user-visible coloring, it certainly has calling-convention-coloring, meaning you cannot call Go functions from C. The grandaddy of multithreading, Cilk, also had this limitation and even generated 2 versions of each functions, one with custom stack to implement continuation-stealing and one with normal ABI that can be called by external. Dealing with that colling-convention coloring is extremely annoying, see <https://words.filippo.io/rustgo/>