Hello

This patch addresses the intermittent hanging seen in the libgomp.c-c++-common/task-detach-6.f90 test.

The main problem is due to the 'omp taskwait' in the test. GOMP_taskwait can run tasks, so for correct semantics it needs to be able to place finished tasks that have unfulfilled completion events into the detach queue, rather than just finishing them immediately (in effect ignoring the detach clause).

Unfinished tasks in the detach queue are still children of their parent task, so they can appear in next_task in the main GOMP_taskwait loop. If next_task is fulfilled then it can be finished immediately, otherwise it will wait on taskwait_sem.

omp_fulfill_event needs to be able to post the taskwait_sem semaphore as well as wake the team barrier. Since the semaphore is located on the parent of the task whose completion event is being fulfilled, I have changed the event handle to being a pointer to the task instead of just the completion semaphore in order to access the parent field.

This type of code is currently used to wake the threads for the team barrier:

  if (team->nthreads > team->task_running_count)
    gomp_team_barrier_wake (&team->barrier, 1);

This issues a gomp_team_barrier_wake if any of the threads are not running a task (and so might be sleeping). However, detach tasks that are queued waiting for a completion event are currently included in task_running_count (because the finish_cancelled code executed later decrements it). Since gomp_barrier_handle_tasks does not block if there are unfinished detached tasks remaining (since during development I found that doing so could cause deadlocks in single-threaded code), threads could be sleeping even if team->nthreads == team->task_running_count, and this code would fail to wake them. I fixed this by decrementing task_running_count when queuing an unfinished detach task, and skipping the decrement in finish_cancelled if the task was a queued detach tash. I added a new gomp_task_kind GOMP_TASK_DETACHED to mark these type of tasks.

I have tried running the task-detach-6 testcase (C and Fortran) 10,000 iterations at a time using 32 threads, on a x86_64 Linux machine with GCC built with --disable-linux-futex, and no hangs. I have checked that it bootstraps, and noticed no regressions in the libgomp testsuite when run without offloading.

With Nvidia and GCN offloading though, task-detach-6 hangs... I _think_ the reason why it 'worked' before was because the taskwait allowed tasks with detach clauses to always complete immediately after execution. Since that backdoor has been closed, task-detach-6 hangs with or without the taskwait.

I think GOMP_taskgroup_end and maybe gomp_task_maybe_wait_for_dependencies also need the same type of TLC as they can also run tasks, but there are currently no tests that exercise it.

The detach support clearly needs more work, but is this particular patch okay for trunk?

Thanks

Kwok
From 12cc24c937e9294d5616dd0cd9a754c02ffb26fa Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung <k...@codesourcery.com>
Date: Thu, 21 Jan 2021 05:38:47 -0800
Subject: [PATCH] openmp: Fix intermittent hanging of task-detach-6 libgomp
 tests [PR98738]

This adds support for the task detach clause to taskwait, and fixes a
number of problems related to semaphores that may lead to a hang in
some circumstances.

2021-01-21  Kwok Cheung Yeung  <k...@codesourcery.com>

        libgomp/

        PR libgomp/98738
        * libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED.
        * task.c (task_fulfilled_p): Check detach field as well.
        (GOMP_task): Use address of task as the event handle.
        (gomp_barrier_handle_tasks): Fix indentation.  Use address of task
        as event handle. Set kind of suspended detach task to
        GOMP_TASK_DETACHED and decrement task_running_count.  Move
        finish_cancelled block out of else branch.  Skip decrement of
        task_running_count if task kind is GOMP_TASK_DETACHED.
        (GOMP_taskwait): Finish fulfilled detach tasks.  Update comment.
        Queue detach tasks that have not been fulfilled.
        (omp_fulfill_event): Use address of task as event handle.  Post
        to taskwait_sem and taskgroup_sem if necessary.  Check
        task_running_count before calling gomp_team_barrier_wake.
        * testsuite/libgomp.c-c++-common/task-detach-5.c (main): Change
        data-sharing of detach events on enclosing parallel to private.
        * testsuite/libgomp.c-c++-common/task-detach-6.c (main): Likewise.
        * testsuite/libgomp.fortran/task-detach-5.f90 (task_detach_5):
        Likewise.
        * testsuite/libgomp.fortran/task-detach-6.f90 (task_detach_6):
        Likewise.
---
 libgomp/libgomp.h                                  |   5 +-
 libgomp/task.c                                     | 155 ++++++++++++++-------
 .../testsuite/libgomp.c-c++-common/task-detach-5.c |   2 +-
 .../testsuite/libgomp.c-c++-common/task-detach-6.c |   2 +-
 .../testsuite/libgomp.fortran/task-detach-5.f90    |   2 +-
 .../testsuite/libgomp.fortran/task-detach-6.f90    |   2 +-
 6 files changed, 115 insertions(+), 53 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index b4d0c93..b24de5c 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -481,7 +481,10 @@ enum gomp_task_kind
      but not yet completed.  Once that completes, they will be readded
      into the queues as GOMP_TASK_WAITING in order to perform the var
      unmapping.  */
-  GOMP_TASK_ASYNC_RUNNING
+  GOMP_TASK_ASYNC_RUNNING,
+  /* Task that has finished executing but is waiting for its
+     completion event to be fulfilled.  */
+  GOMP_TASK_DETACHED
 };
 
 struct gomp_task_depend_entry
diff --git a/libgomp/task.c b/libgomp/task.c
index b242e7c..dbd6284 100644
--- a/libgomp/task.c
+++ b/libgomp/task.c
@@ -330,7 +330,7 @@ gomp_task_handle_depend (struct gomp_task *task, struct 
gomp_task *parent,
 static bool
 task_fulfilled_p (struct gomp_task *task)
 {
-  return gomp_sem_getcount (&task->completion_sem) > 0;
+  return task->detach && gomp_sem_getcount (&task->completion_sem) > 0;
 }
 
 /* Called when encountering an explicit task directive.  If IF_CLAUSE is
@@ -419,11 +419,11 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
(void *, void *),
        {
          task.detach = true;
          gomp_sem_init (&task.completion_sem, 0);
-         *(void **) detach = &task.completion_sem;
+         *(void **) detach = &task;
          if (data)
-           *(void **) data = &task.completion_sem;
+           *(void **) data = &task;
 
-         gomp_debug (0, "New event: %p\n", &task.completion_sem);
+         gomp_debug (0, "New event: %p\n", &task);
        }
 
       if (thr->task)
@@ -488,11 +488,11 @@ GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) 
(void *, void *),
        {
          task->detach = true;
          gomp_sem_init (&task->completion_sem, 0);
-         *(void **) detach = &task->completion_sem;
+         *(void **) detach = task;
          if (data)
-           *(void **) data = &task->completion_sem;
+           *(void **) data = task;
 
-         gomp_debug (0, "New event: %p\n", &task->completion_sem);
+         gomp_debug (0, "New event: %p\n", task);
        }
       thr->task = task;
       if (cpyfn)
@@ -1372,14 +1372,14 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
                                 child_task, MEMMODEL_RELAXED);
          --team->task_detach_count;
          gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
-                     thr->ts.team_id, &child_task->completion_sem);
+                     thr->ts.team_id, &child_task);
 
-       if (to_free)
-         {
-           gomp_finish_task (to_free);
-           free (to_free);
-           to_free = NULL;
-         }
+         if (to_free)
+           {
+             gomp_finish_task (to_free);
+             free (to_free);
+             to_free = NULL;
+           }
          goto finish_cancelled;
        }
 
@@ -1452,41 +1452,43 @@ gomp_barrier_handle_tasks (gomp_barrier_state_t state)
        {
          if (child_task->detach && !task_fulfilled_p (child_task))
            {
+             child_task->kind = GOMP_TASK_DETACHED;
              priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
                                     child_task, child_task->priority,
                                     PRIORITY_INSERT_END,
                                     false, false);
              ++team->task_detach_count;
-             gomp_debug (0, "thread %d: queueing task with event %p\n",
-                         thr->ts.team_id, &child_task->completion_sem);
+             --team->task_running_count;
+             gomp_debug (0,
+                         "thread %d: queuing detached task with event %p\n",
+                         thr->ts.team_id, child_task);
              child_task = NULL;
+             continue;
            }
-         else
+
+        finish_cancelled:;
+         size_t new_tasks
+           = gomp_task_run_post_handle_depend (child_task, team);
+         gomp_task_run_post_remove_parent (child_task);
+         gomp_clear_parent (&child_task->children_queue);
+         gomp_task_run_post_remove_taskgroup (child_task);
+         to_free = child_task;
+         if (!cancelled && child_task->kind != GOMP_TASK_DETACHED)
+           team->task_running_count--;
+         child_task = NULL;
+         if (new_tasks > 1)
            {
-            finish_cancelled:;
-             size_t new_tasks
-               = gomp_task_run_post_handle_depend (child_task, team);
-             gomp_task_run_post_remove_parent (child_task);
-             gomp_clear_parent (&child_task->children_queue);
-             gomp_task_run_post_remove_taskgroup (child_task);
-             to_free = child_task;
-             child_task = NULL;
-             if (!cancelled)
-               team->task_running_count--;
-             if (new_tasks > 1)
-               {
-                 do_wake = team->nthreads - team->task_running_count;
-                 if (do_wake > new_tasks)
-                   do_wake = new_tasks;
-               }
-             if (--team->task_count == 0
-                 && gomp_team_barrier_waiting_for_tasks (&team->barrier))
-               {
-                 gomp_team_barrier_done (&team->barrier, state);
-                 gomp_mutex_unlock (&team->task_lock);
-                 gomp_team_barrier_wake (&team->barrier, 0);
-                 gomp_mutex_lock (&team->task_lock);
-               }
+             do_wake = team->nthreads - team->task_running_count;
+             if (do_wake > new_tasks)
+               do_wake = new_tasks;
+           }
+         if (--team->task_count == 0
+             && gomp_team_barrier_waiting_for_tasks (&team->barrier))
+           {
+             gomp_team_barrier_done (&team->barrier, state);
+             gomp_mutex_unlock (&team->task_lock);
+             gomp_team_barrier_wake (&team->barrier, 0);
+             gomp_mutex_lock (&team->task_lock);
            }
        }
     }
@@ -1556,10 +1558,28 @@ GOMP_taskwait (void)
              goto finish_cancelled;
            }
        }
+      else if (next_task->kind == GOMP_TASK_DETACHED
+              && task_fulfilled_p (next_task))
+       {
+         child_task = next_task;
+         gomp_debug (0, "thread %d: found task with fulfilled event %p\n",
+                     thr->ts.team_id, &child_task);
+         priority_queue_remove (PQ_TEAM, &team->task_detach_queue,
+                                child_task, MEMMODEL_RELAXED);
+         --team->task_detach_count;
+         if (to_free)
+           {
+             gomp_finish_task (to_free);
+             free (to_free);
+             to_free = NULL;
+           }
+         goto finish_cancelled;
+       }
       else
        {
        /* All tasks we are waiting for are either running in other
-          threads, or they are tasks that have not had their
+          threads, are detached and waiting for the completion event to be
+          fulfilled, or they are tasks that have not had their
           dependencies met (so they're not even in the queue).  Wait
           for them.  */
          if (task->taskwait == NULL)
@@ -1614,6 +1634,21 @@ GOMP_taskwait (void)
       gomp_mutex_lock (&team->task_lock);
       if (child_task)
        {
+         if (child_task->detach && !task_fulfilled_p (child_task))
+           {
+             child_task->kind = GOMP_TASK_DETACHED;
+             priority_queue_insert (PQ_TEAM, &team->task_detach_queue,
+                                    child_task, child_task->priority,
+                                    PRIORITY_INSERT_END,
+                                    false, false);
+             ++team->task_detach_count;
+             gomp_debug (0,
+                         "thread %d: queuing detached task with event %p\n",
+                         thr->ts.team_id, child_task);
+             child_task = NULL;
+             continue;
+           }
+
         finish_cancelled:;
          size_t new_tasks
            = gomp_task_run_post_handle_depend (child_task, team);
@@ -2402,17 +2437,41 @@ ialias (omp_in_final)
 void
 omp_fulfill_event (omp_event_handle_t event)
 {
-  gomp_sem_t *sem = (gomp_sem_t *) event;
+  struct gomp_task *task = (struct gomp_task *) event;
+  struct gomp_task *parent = task->parent;
   struct gomp_thread *thr = gomp_thread ();
   struct gomp_team *team = thr ? thr->ts.team : NULL;
 
-  if (gomp_sem_getcount (sem) > 0)
-    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", sem);
+  if (gomp_sem_getcount (&task->completion_sem) > 0)
+    gomp_fatal ("omp_fulfill_event: %p event already fulfilled!\n", task);
 
-  gomp_debug (0, "omp_fulfill_event: %p\n", sem);
-  gomp_sem_post (sem);
-  if (team)
+  gomp_debug (0, "omp_fulfill_event: %p\n", task);
+  gomp_sem_post (&task->completion_sem);
+
+  /* Wake up any threads that may be waiting for the detached task
+     to complete.  */
+  gomp_mutex_lock (&team->task_lock);
+  if (parent && parent->taskwait)
+    {
+      if (parent->taskwait->in_taskwait)
+       {
+         parent->taskwait->in_taskwait = false;
+         gomp_sem_post (&parent->taskwait->taskwait_sem);
+       }
+      else if (parent->taskwait->in_depend_wait)
+       {
+         parent->taskwait->in_depend_wait = false;
+         gomp_sem_post (&parent->taskwait->taskwait_sem);
+       }
+    }
+  if (task->taskgroup && task->taskgroup->in_taskgroup_wait)
+    {
+      task->taskgroup->in_taskgroup_wait = false;
+      gomp_sem_post (&task->taskgroup->taskgroup_sem);
+    }
+  if (team && team->nthreads > team->task_running_count)
     gomp_team_barrier_wake (&team->barrier, 1);
+  gomp_mutex_unlock (&team->task_lock);
 }
 
 ialias (omp_fulfill_event)
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
index 5a01517..71bcde9 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-5.c
@@ -12,7 +12,7 @@ int main (void)
   int thread_count;
   omp_event_handle_t detach_event1, detach_event2;
 
-  #pragma omp parallel firstprivate(detach_event1, detach_event2)
+  #pragma omp parallel private(detach_event1, detach_event2)
   {
     #pragma omp single
       thread_count = omp_get_num_threads();
diff --git a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c 
b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
index b5f68cc..e7af05a 100644
--- a/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
+++ b/libgomp/testsuite/libgomp.c-c++-common/task-detach-6.c
@@ -14,7 +14,7 @@ int main (void)
   omp_event_handle_t detach_event1, detach_event2;
 
   #pragma omp target map(tofrom: x, y, z) map(from: thread_count)
-    #pragma omp parallel firstprivate(detach_event1, detach_event2)
+    #pragma omp parallel private(detach_event1, detach_event2)
       {
        #pragma omp single
          thread_count = omp_get_num_threads();
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-5.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
index 955d687..8bebb5c 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-5.f90
@@ -10,7 +10,7 @@ program task_detach_5
   integer :: x = 0, y = 0, z = 0
   integer :: thread_count
 
-  !$omp parallel firstprivate(detach_event1, detach_event2)
+  !$omp parallel private(detach_event1, detach_event2)
     !$omp single
       thread_count = omp_get_num_threads()
     !$omp end single
diff --git a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90 
b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
index 0fe2155..437ca66 100644
--- a/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
+++ b/libgomp/testsuite/libgomp.fortran/task-detach-6.f90
@@ -12,7 +12,7 @@ program task_detach_6
   integer :: thread_count
 
   !$omp target map(tofrom: x, y, z) map(from: thread_count)
-    !$omp parallel firstprivate(detach_event1, detach_event2)
+    !$omp parallel private(detach_event1, detach_event2)
       !$omp single
        thread_count = omp_get_num_threads()
       !$omp end single
-- 
2.8.1

Reply via email to