On Thu, Jun 06, 2019 at 03:25:24AM +0900, 김규래 wrote:
> Hi, thanks for the detailed explanation.
> I think I now get the picture.
> Judging from my current understanding, the task-parallelism currently works 
> as follows: 
> 1. Tasks are placed in a global shared queue.

It isn't a global shared queue, but a per-team shared queue, in fact 3
different ones, guarded by the same per-team mutex team->task_lock though:
team->task_queue        used on barriers
task->children_queue    used for #pragma omp taskwait
taskgroup->taskgroup_queue      used at the end of #pragma omp taskgroup

> 2. Workers consume the tasks by bashing the queue in a while loop, just as 
> self-scheduling (dynamic scheduling)/
>  
> Then the improvements including work-stealing must be done by:
> 1. Each worker holds a dedicated task queue reducing the resource contention.
> 2. The tasks are distributed in a round-robin fashion
> 3. work-stealing will resolve the load imbalance.
>  
> If the above statements are correct, I guess the task priority should be 
> given some special treatment?

Yes, one thing to consider is task priority, another thing is what to do
on those #pragma omp taskwait or #pragma omp taskgroup waits where while one
can schedule most of the tasks (the OpenMP specification has restrictions on
what can be scheduled, but the restrictions are mostly relevant to untied
tasks which we don't support anyway (well, ignore untied and schedule them
like tied tasks)), but it might be more beneficial (and what is currently
implemented) to queue only the tasks that will help satisfying what we are
waiting on.  All of priority and only scheduling task or taskgroup children
instead of all tasks are just hints, the implementation may choose other
tasks.
There is another thing though, tasks with dependencies, where
gomp_task_handle_depend which needs to look up addresses in hash tables
for the dependencies also uses the team->task_lock mutex.

        Jakub

Reply via email to