@Araq, @mratsim: As promised, I'm posting some code that shows the current 
difficulty of bypassing the overly simplistic (and performance costly) default 
behaviour for the current channels and threadpool libraries of deepCopy'ing all 
GC'ed ref's so they won't be prematurely destroyed if they go out of scope in 
the thread where they were created.

I was wrong previously in thinking that my difficulties in making this work 
were due to generic processes and thus nested templates; in fact, I think the 
problems were due to all the "cruft and compiler magic" not considering 
recursive algorithms where there may be threads spawned from within threads ad 
nauseum, which is also likely the problem that @mratsim has had in running his 
benchmarks. Accordingly, this benchmark **wraps absolutely everything that has 
to do with GC** (just in case) and should be able to handle just about an 
reasonable level of recursion, although I'm not sure about what nesting levels 
of "toDispose" lists generated by the protect/dispose pairs will do to the 
stack - there isn't much I can do about those without computer magic anyway, 
and we certainly don't want to add more of that if it isn't necessary.

The code linked below implements a little benchmark to cycle through spawning 
10,000 (trivial) tasks from the threadpool using a manual closure implemented 
customizable iterator through closures and includes a "polymorphic" converter 
function closure parameter. It is close to what I require to cleanly implement 
my version of the "Ultimate Sieve of Eratosthens in Nim" algorithm, which does 
require the ability to nest and recursively spawn threads. I've divided the 
code into modules by functionality (the file tabs across the top of the source 
code section) for ready reference and so you can see that the actual benchmark 
is fairly trivial; most of the code is there to make deepCopy unnecessary by 
preserving the GC'ed ref's in the current Nim provided ways. I've tried to make 
the code concise and elegant, but the need to do this is **UGLY**.

However, it is likely the easiest to implement "Plan B" if the "newruntime" 
doesn't work out rather than the huge project of implementing a multi-threaded 
GC - just make the extra support modules available as one or more libraries. 
This [link on Wandbox](https://wandbox.org/permlink/TSMrMyVVcikS9Bty) is the 
runnable code. It is run in full release mode at about 400 milliseconds on 
aIntel Xeon Sandy Bridge CPU at 2.5 GHz for which we are given the use of three 
threads, two of which likely share a core (Hyper Threaded). Thus, there are 
about a 2.5 billion total cycles used across all available threads, which means 
that there are about 250 thousand cycles or about 100 microseconds per thread 
spawn including overheads. This sounds like a lot but actually isn't bad 
considering it takes something like ten to a hundred times as long to do this 
by "spinning up a new thread" for every task.

Now, if "newruntime" does work out and for algorithms such as mine where single 
ownership is adequate, much of this code would just "go away": owned ref's 
would replace "RCRef", no wrappers would be required for closures or for ref's 
because they would be owned, and the channels and threadpool libraries could be 
re-written to be much simpler without the "cruft and compiler magic", **not 
depending on using global resources** , and thus written to be completely 
recursive if necessary.

All that would be left would be the benchmark itself, and even that would be 
much simpler, more concise, and more elegant through not having to call into 
the extra wrappers. It should also be a little (perhaps twice) as fast through 
not having the GC fighting us in the background and the more direct forms of 
code.

I had started work on converting this to use my own emulation of owned ref's 
(since release builds won't run with threading and newruntime on at the same 
time), but don't think I'll pursue it as emulating the new closures is quite 
hard without compiler help and there is little point if we are soon to have 
newruntime run for threads, which looks to be imminent. I'll reserve the effort 
for when that happens, as I think it is reasonably clear from this work how 
much easier that could make working across threads!

Reply via email to