Re: std.parallelism changes done

dsimcha Thu, 24 Mar 2011 21:16:07 -0700

On 3/24/2011 10:21 PM, Sönke Ludwig wrote:

Indeed this pattern solves the problem to wait for the completion of a
specific task. It also avoids a huge potential of deadlocks that a
general yield() that does not take a task would have. However, it will
not solve the general problem of one task waiting for another, which
could be in terms of a condition variable or just a mutex that is used
in the middle of the task execution.


Can you elaborate and/or provide an example of the "general" problem?
I'm not quite sure what you're getting at.


I have one very specific constellation that I can only sketch.. Suppose
you have some kind of complex computation going on in the ThreadPool.
This computation is done by a large set of tasks where each tasks
depends on the result of one or more other tasks. One task is
responsible for coordinating the work - it is spawning tasks and waiting
for their completion to spawn new tasks for which the results are now
available.

As I've said before in related discussions, you are _probably_ betteroff using one of the high level primitives instead of using tasksdirectly in these cases. If not, I'd prefer to improve the higher levelprimitives and/or create new ones if possible. (Feel free to suggestone if you can think of it.) Tasks are, IMHO, too low level foranything except basic future/promise parallelism and implementing higherlevel primitives. Incidentally the situation you describe (acoordinator task creating lots of worker tasks) is exactly how amap(),reduce() and parallel foreach work under the hood. This complexity iscompletely encapsulated, though.

First thing here is that you do not want to do the waitForce() kind of
waiting in the coordinator task because this might cause the coordinator
to be busy with an expensive taks while it could already spawn new tasks
because maybe in the meantime some other tasks have already finished.


I assume you mean yieldForce().


However, if you wait for a condition variable instead (which is fired
after each finished task) and if you can have multiple computations of
this kind running in parallel, you can immediately run into the
situation that the thread pool is crowded with coordinator tasks that
are all waiting for their condition variables which will never be
triggered because no worker tasks can be executed.

I assume you're talking about a condition variable other than the oneyieldForce() uses. As mentioned previously, in the specific case ofyieldForce() this is a solved problem. In the general case I can seethe problem.


This is only one example and basically this problem can arise in all
cases where one task depends on another task by some form of waiting
that will not execute the dependency like waitForce() does.


Hmm, ok, I definitely understand the problem now.

But what I wanted to say is, even if it may be difficult to implement
such thread caching now, putting means to execute a Task in its own
thread now into the ThreadPool allows for such an optimization later (it
could even exist while still keeping Task.executeInNewThread()).


I can't really comment because I still don't understand this very well.


I hope I could make it a little more clear what I mean. The problem is
just that the system I'm talking about is quite complex and it's not
easy to find good and simple examples in that system. The problems of
course arise only in the most complex pathes of execution..

What I'm not sure about is if executeInNewThread() is supposed to be
useful just because it is somtimes nice to have the fine-grained
parallelism of the OS scheduler as opposed to task granilarity, or if
the advantage is supposed to be efficiency gained because the thread
pool is not created. In the latter case the caching of some threads to
be reused for a executeInOwnThread()-method should lead to a better
performance in almost any case where thread creation overhead is relevant.

Ok, now I'm starting to understand this. Please correct me (once you'vegotten a good night's sleep and can think again) wherever this is wrong:

1. As is currently the case, executeInNewThread() is _guaranteed_ tostart the task immediately. There is never a queue involved.

2. Unlike the current implementation, executeInNewThread() may use acached thread. It will **NOT**, however, put the task on a queue orotherwise delay its execution. If no cached thread is available, itwill create a new one and possibly destroy it when the task is done.

Thanks for this suggestion. Now that I (think I) understand it, itmakes sense in principle. The devil may be in the details, though.

1. How many threads should be cached? I guess this could just beconfigurable with some reasonable default.


2.  Should the cache be lazily or eagerly created?  I'd assume lazily.

3. Where would these threads be stored? I think they probably belongin some kind of thread-safe global data structure, **NOT** in a TaskPoolinstance.

4. How would we explain to people what the cache is good for and how touse it? The fact that you're proposing it and even you find thisdifficult to convey makes me skeptical that this feature is worth theweight it adds to the API. Maybe you'll find it easier once you getsome sleep. (I understand the problem it solves at an abstract levelbut have never encountered a concrete use case. It also took me a longtime to understand it.)

5. It would break some relaxations of what @safe tasks can do whenstarted via executeInNewThread().

6. This whole proposal might fit better in std.concurrency, by using athread cache for spawn().

Re: std.parallelism changes done

Reply via email to