On 01/08/2010 19:17, dsimcha wrote:
== Quote from Jonathan M Davis ([email protected])'s article
Okay. From what I can tell, it seems to be a recurring pattern with threads that
it's useful to spawn a thread, have it do some work, and then have it return the
result and terminate. The appropriate way to do that seems to spawn the thread
with the data that needs to be passed and then using send to send what would
normally be the return value before the function (and therefore the spawned
thread) terminates. I see 2 problems with this, both stemming from immutability.
I think the bottom line is that D's threading model is designed to put safety
and
simplicity over performance and flexibility. Given the amount of bugs that are
apparently generated when using threading for concurrency in large-scale
software
written by hordes of programmers, this may be a reasonable tradeoff.
Within the message-passing model, one thing that would help a lot is a Unique
type
that can be implicitly and destructively converted to immutable or shared. In D
as it stands right now, immutable is basically useless in all but the simplest
cases because it's just too hard to build complex immutable data structures,
especially if you want to avoid unnecessary copying or having to rely on casts
and
manually checked assumptions in at least small areas of the program. In theory,
immutable solves tons of problems, but in practice it solves very few. While I
don't understand shared that well, I guess a Unique type would help in creating
shared data, too.
There are two reasons for using multithreading: Parallelism (using multiple
cores
to increase throughput) and concurrency (making things appear to be happening
simultaneously to decrease latency; this makes sense even on a single-core
machine). One may come as a side effect of the other, but usually only one is
the
goal. It sounds like you're looking for parallelism. When using threading for
parallelism as opposed to concurrency, this tradeoff of simplicity and safety in
exchange for flexibility and performance doesn't work so well because:
1. When using threading for parallelism instead of concurrency, it's reasonable
to do some unsafe stuff to get better performance, since performance is the
whole
point anyhow.
2. Unlike the concurrency case, the parallelism case usually occurs only in
small
hotspots of a program, or in small scientific computing programs. In these
cases
it's not that hard for the programmer to manually track what's shared, etc.
3. In my experience at least, parallelism often requires finer grained
communication between threads than concurrency. For example, an OS timeslice is
about 15 milliseconds, meaning that on single core machines threads being used
for
concurrency simply can't communicate more often than that. I've written useful
parallel code that scaled to at least 4 cores and required communication between
threads several times per millisecond. It could have been written more
efficiently w.r.t. communication between threads, but it would have required a
lot
more memory allocations and been less efficient in other respects.
While I completely agree that message passing should be D's **flagship**
threading
model because it's been proven to work well in a lot of cases, I'm not sure if
it
should be the **only** one well-supported out of the box because it's just too
inflexible when you want pull-out-all-stops parallelism. As Robert Jacques
mentioned, I've been working on a parallelism library. The code is at:
http://dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d
The docs are at:
http://cis.jhu.edu/~dsimcha/parallelFuture.html
I've been thinking lately about how to integrate this into the new threading
model, as it's currently completely unsafe, doesn't use shared at all, and was
written before the new threading model was implemented. (core.thread still
takes
an unshared delegate). I think before we can solve the problems you've brought
up, we need to clarify how non-message passing based multithreading (i.e. using
shared) is going to work in D, as right now it is completely unclear at least
to me.
I completely agree with everything you said and I really dislike how D2
currently seems to virtually impose an application architecture based on
the message passing model if you don't want to circumvent and thus break
the entire type system. While I do agree that message passing makes a
lot of sense as the default choice, there also has to be well
thought-out and extensive support for the shared memory model if D2 is
really focusing on the concurrency issue as much as it claims.
Personally, I've found hybrid architectures where both models are
combined as needed to be the most flexible and best performing approach
and there is no way a language touted to be a systems language should
impose one model over the other and stop the programmer from doing
things the way he wants.
/Max