Replying mostly to your first reply @GordonBGood

> threadpool/spawn/FlowVar is a beautiful modern concept for multi-threading


> So in a few hours I took my wrapper implementations and re-wrote my required 
> parts of threadpool to not consider GC'ed data structures in about 150 lines 
> (including the wrapper implementations), and that "just worked" without 
> compiler magic or generic code limitations. In fact, this implementation is 
> almost twice as fast as the original even when it worked and that is in spite 
> of using atomically reference counted wrappers as a general memory management 
> destructor control system; the newruntime's ownership on ref's model will be 
> even faster and I think will work in this applications's use case as 
> ownership doesn't need to be shared beyond the creating entities. I can post 
> this code somewhere if anyone is interested.

That's also the approach I took in my proof-of-concept.

> As to "cycle stealing"/"load balancing", the threadpool/spawn model actually 
> takes care of that as the underlying platform implementation of threads will 
> have a scheduler that automatically "time-slices" across all the currently 
> running threads from the threadpool; it is only necessary to provide 
> tasks/work to these threads to be in big enough time units so as to be 
> significantly larger than the overheads of doing the multi-threading.

Work-stealing / load balancing is needed. Most parallel tree algorithms lead to 
unbalanced thread loads. The industry standard benchmark to reproduce those 
issues is the Unbalanced Tree Search benchmark: 

> I won't be commenting on your Picasso RFC as, in my mind for everyday users 
> and me, it is too complex, so I'll let those that perhaps have more 
> sophisticated needs contribute to it and possibly implement it if it seems to 
> fill their needs.

I think you misunderstood the RFC audience/section or I was not clear enough.

The API for library users is spawn/^, you can see 2 examples in my 

 (async will be renamed spawn and await will be renamed ^)
  * [Single producer multi-consumer task 

The rest of the RFC is about giving a full picture to the Nim community about 
all the blocks and considerations that should go into writing a multithreading 
runtime, i.e. those are implementation "details" (as much as I hate the 

Most of the sophistication is because it is needed to allow naive usage from 
people who follow tutorials on multithreading with pi/fibonacci example to 
unbalanced workload to high-performance computing.

The simplest implementation with no scheduler moves the complexity from the 
library writer to the library user who most likely is not an expert of 
multithreading efficiently.


Also the current threadpool/spawn/^ does not work on the "Hello World!" of 
multithreading which for better or worse is fibonacci (which is even sillier 
than the pi algorithm), see 
 Note that even GCC's version of OpenMP chokes on that benchmark because it 
uses a naive global task queue (like Nim's threadpool) that is a contention 
point, while LLVM's OpenMP uses work-stealing.

Reply via email to