Replying mostly to your first reply @GordonBGood

> threadpool/spawn/FlowVar is a beautiful modern concept for multi-threading

Agreed

> So in a few hours I took my wrapper implementations and re-wrote my required 
> parts of threadpool to not consider GC'ed data structures in about 150 lines 
> (including the wrapper implementations), and that "just worked" without 
> compiler magic or generic code limitations. In fact, this implementation is 
> almost twice as fast as the original even when it worked and that is in spite 
> of using atomically reference counted wrappers as a general memory management 
> destructor control system; the newruntime's ownership on ref's model will be 
> even faster and I think will work in this applications's use case as 
> ownership doesn't need to be shared beyond the creating entities. I can post 
> this code somewhere if anyone is interested.

That's also the approach I took in my proof-of-concept.

> As to "cycle stealing"/"load balancing", the threadpool/spawn model actually 
> takes care of that as the underlying platform implementation of threads will 
> have a scheduler that automatically "time-slices" across all the currently 
> running threads from the threadpool; it is only necessary to provide 
> tasks/work to these threads to be in big enough time units so as to be 
> significantly larger than the overheads of doing the multi-threading.

Work-stealing / load balancing is needed. Most parallel tree algorithms lead to 
unbalanced thread loads. The industry standard benchmark to reproduce those 
issues is the Unbalanced Tree Search benchmark: 
[https://sourceforge.net/p/uts-benchmark/wiki/Home](https://sourceforge.net/p/uts-benchmark/wiki/Home)/

> I won't be commenting on your Picasso RFC as, in my mind for everyday users 
> and me, it is too complex, so I'll let those that perhaps have more 
> sophisticated needs contribute to it and possibly implement it if it seems to 
> fill their needs.

I think you misunderstood the RFC audience/section or I was not clear enough.

The API for library users is spawn/^, you can see 2 examples in my 
proof-of-concept:

  * 
[Fibonacci](https://github.com/mratsim/weave/blob/master/e04_channel_based_work_stealing/tests/fib.nim#L17)
 (async will be renamed spawn and await will be renamed ^)
  * [Single producer multi-consumer task 
loop](https://github.com/mratsim/weave/blob/master/e04_channel_based_work_stealing/tests/spc.nim#L80)



The rest of the RFC is about giving a full picture to the Nim community about 
all the blocks and considerations that should go into writing a multithreading 
runtime, i.e. those are implementation "details" (as much as I hate the 
expression).

Most of the sophistication is because it is needed to allow naive usage from 
people who follow tutorials on multithreading with pi/fibonacci example to 
unbalanced workload to high-performance computing.

The simplest implementation with no scheduler moves the complexity from the 
library writer to the library user who most likely is not an expert of 
multithreading efficiently.

\---

Also the current threadpool/spawn/^ does not work on the "Hello World!" of 
multithreading which for better or worse is fibonacci (which is even sillier 
than the pi algorithm), see 
[https://github.com/nim-lang/Nim/issues/11922](https://github.com/nim-lang/Nim/issues/11922).
 Note that even GCC's version of OpenMP chokes on that benchmark because it 
uses a naive global task queue (like Nim's threadpool) that is a contention 
point, while LLVM's OpenMP uses work-stealing.

Reply via email to