On Sun, 01 Aug 2010 06:24:18 -0400, Jonathan M Davis
<[email protected]> wrote:
Okay. From what I can tell, it seems to be a recurring pattern with
threads that
it's useful to spawn a thread, have it do some work, and then have it
return the
result and terminate. The appropriate way to do that seems to spawn the
thread
with the data that needs to be passed and then using send to send what
would
normally be the return value before the function (and therefore the
spawned
thread) terminates. I see 2 problems with this, both stemming from
immutability.
1. _All_ of the arguments passed to spawn must be immutable. It's not
that hard
to be in a situation where you need to pass it arguments that the parent
thread
will never use, and it's highly probable that that data will have to be
copied
to make it immutable so that it can be passed. The result is that you're
forced
to make pointless copies. If you're passing a lot of data, that could be
expensive.
2. _All_ of the arguments returned via send must be immutable. In the
scenario
that I'm describing here, the thread is going away after sending the
message, so
there's no way that it's going to do anything with the data, and having
to copy
it to make it immutable (as will likely have to be done) can be highly
inefficient.
Is there a better way to do this? Or if not, can one be created? It
seems to me
that it would be highly desirable to be able to pass mutable reference
types
between threads where the thread doing the receiving takes control of the
object/array being passed. Due to D's threading model, a copy may still
have to
be done behind the scenes, but if you could pass mutable data across
while
passing ownership, you could have at most 1 copy rather than the 2 - 3
copies
that would have to be taking place when you have a mutable obect that
you're
trying to send across threads (so, one copy to make it immutable,
possibly a
copy from one thread local storage to another of the immutable data
(though I'd
hope that that wouldn't require a copy), and one copy on the other end
to get
mutable data from the immutable data). As it stands, it seems painfully
inefficient to me when you're passing anything other than small amounts
of data
across.
Also, this recurring pattern that I'm seeing makes me wonder if it would
be
advantageous to have an addititon to std.concurrency where you spawned a
thread
which returned a value when it was done (rather than having to use a
send with a
void function), and the parent thread used a receive call of some kind
to get
the return value. Ideally, you could spawn a series of threads which
were paired
with the variables that their return values would be assigned to, and
you could
do it all as one function call.
Overall, I really like D's threading model, but it seems to me that it
could be
streamlined a bit.
- Jonathan M Davis
Hi Jonathan,
It sounds like what you really want is a task-based parallel programming
library, as opposed to concurrent thread. I'd recommend Dave Simcha's
parallelFuture library if you want to play around with this in D
(http://www.dsource.org/projects/scrapple/browser/trunk/parallelFuture/parallelFuture.d).
However, parallelFuture is currently unsafe - you need to make sure that
logically speaking that data the task is being passed is immutable.
Shared/const/immutable delegates have been brought up before as a way to
formalize the implicit assumptions of libraries like parallelFuture, but
nothing has come of it yet.
As for std.concurrency, immutability is definitely the correct way to go,
even if it means extra copying: for most jobs the processing should
greatly out way the cost of copying and thread initialization (though
under the hood thread pools should help with the latter). A large amount
of experience dictates that shared mutable data, let alone unprotected
mutable data, is a bug waiting to happen.
On a more practical note, if you relaxing either 1) or 2) can cause major
problems with certain modern GCs, so at a minimum casts should be involved.