On Wednesday, 5 February 2014 at 15:38:43 UTC, Bienlein wrote:
On a very well equipped machine 10.000 threads is about the maximum for the JVM. Now for D 1.000.000 kernel threads are not a problem!? Well, I'm a D newbie and a bit confused now... Have to ask some questions trying not to bug people. Apparently, a kernel thread in D is not an OS thread. Does D have it's own threading model then? Couldn't see that from what I found on dlang.org. The measurement result for fibers is that much better as for threads, because fibers have less overhead for context switching? Will actors in D benefit from your FiberScheduler when it has been released? Do you know which next version of D your FiberScheduler is planned to be included?
Well, I spawned 1 million threads, but there's no guarantee that 1 million were running concurrently. So I decided to run a test. I forced the code to block until all threads were started, and when using kernel threads this hung with 2047 threads running (this is on OSX). So I think OSX has a hard internal limit of 2047 threads. It's possible this can be extended somehow, but I didn't investigate. And since I don't currently have a great way to block fibers, what I was doing there was a busy wait, which was just slow going waiting for all the threads to spin up. Next I just figured I'd keep a high water mark for concurrent thread count for the code I posted yesterday. Both fibers and kernel threads topped out at about 10. For fibers, this makes perfect sense given the yield strategy (each client thread yields 10 times while running). And I guess the scheduling for kernel threads made that come out about the same. So the fact that I was able to spawn 1 million kernel threads doesn't actually mean a whole lot. I should have thought about that more yesterday. Because of the added synchronization counting threads, everything slowed down a bit, so I reduced the number of threads to 100.000. Here are some timings: $ time concurrency threads numThreadsToSpawn = 100000, maxConcurrent = 12 real 1m8.573s user 1m22.516s sys 0m27.985s $ time concurrency fibers numThreadsToSpawn = 100000, maxConcurrent = 10 real 0m5.860s user 0m3.493s sys 0m2.361s So in short, a "kernel thread" in D (which is equivalent to instantiating a core.thread.Thread) is an OS thread. The fibers are user-space threads that context switch when explicitly yielded and use core.thread.Fiber. One thing to note about the FiberScheduler is that I haven't sorted out a solution for thread-local storage. So if you're using the FiberScheduler and each "thread" is accessing some global static data it expects to be exclusive to itself, you'll end up with an undefined result. Making D's "thread-local by default" actually be fiber-local when using fibers is a pretty hard problem to solve, and can be dealt with later if the need arises. My hope was that by making the choice of scheduler user-defined however, it's up to the user to choose the appropriate threading model for their application, and we can hopefully sidestep the need to sort this out. It was the main issue blocking my doing this ages ago, and I didn't think of this pluggable approach until recently. The obvious gain here is that std.concurrency is no longer strictly limited by the overhead of kernel threads, and so can be used more according to the actor model as was originally intended. I can imagine more complex schedulers multiplexing fibers across a pool of kernel threads, for example. The FiberScheduler is more a proof of concept than anything. As for when this will be available... I will have a pull request sorted out shortly, so you could start playing with it soon. It being included in an actual release means a review and such, but as this is really just a fairly succinct change to an existing module, I hope it won't be terribly contentious.
In Go you can easily spawn 100.000 goroutines (aka green threads), probably several 100.000. Being able to spawn way more than 100.000 threads in D with little context switching overhead as with using fibers you are basically in the same league as with Go. And D is a really rich language contrary to Go. This looks cool :-)
Yeah, I think it's exciting. I had originally modeled std.concurrency after Erlang and like the way the syntax worked out, but using kernel threads is limiting. I'm interested to see how this scales once people start playing with it. It's possible that some tuning of when yields occur may be needed as time goes on, but that really needs more eyes than my own and probably multiple real world tests as well. As some general background on actors vs. CSP in std.concurrency, I chose actors for two reasons. First, the communication model for actors is unstructured, so it's adaptable to a lot of different application designs. If you want structure you can impose it at the protocol level, but it isn't necessary to do so--simply using std.concurency requires practically no code at all for the simple case. And second, I wasn't terribly fond of the "sequential" part of CSP. I really want a messaging model that scales horizontally across processes and across hosts, and the CSP algebra doesn't work that way. At the time, I found a few algebras that were attempting to basically merge the two approaches, but nothing really stood out.
