Guido, Wolfgang,

> can you identify the mutex causing the trouble? There should not be any 
> anymore in FETools::compute_embedding_matrices, unless the TaskGroup 
> introduces one.

It's the 'completion_mutex' introduced by Task - somehow, all the
threads try to acquire a mutex, and this blocks everything.

Wolfgang, why does TaskDescriptor use a mutex after all? Is there a
particular reason why this is needed?

We've looked into base/thread_management.h and our conclusion is that
TBB should be able to handle the whole process of task scheduling by
itself, in particular the termination (where the mutex comes into play).
We append a patch that lets TBB wait for the task to complete (using
tbb::Task::wait_for_all() at the correct place). The only restriction
TBB seems to set is that wait_for_all is called only once. With this
patch, everything is fine on our machines.

Wolfgang, can you please test this on your machines and see if it works
for you, too?

Best,
Katharina and Martin
Index: include/deal.II/base/thread_management.h
===================================================================
--- include/deal.II/base/thread_management.h	(revision 22834)
+++ include/deal.II/base/thread_management.h	(working copy)
@@ -3694,22 +3694,8 @@
 	    call (task_descriptor.function, task_descriptor.ret_val);
 
 					     // indicate that the task
-					     // has finished, both
-					     // through the flag and
-					     // through the mutex that
-					     // was locked before we
-					     // started and that now
-					     // needs to be
-					     // released. this may
-					     // also wake up all
-					     // threads that may be
-					     // waiting for the task's
-					     // demise by blocking on
-					     // completion_mutex.acquire()
-					     // in
-					     // TaskDescriptor::join().
+					     // has finished
 	    task_descriptor.task_is_done = true;
-	    task_descriptor.completion_mutex.release ();
 
 	    return 0;
 	  }
@@ -3798,19 +3784,6 @@
 					  */
 	bool task_is_done;
 
-                                         /**
-                                          * Mutex used to indicate
-                                          * when the task is done. It
-                                          * is locked before the task
-                                          * is spawned; the join()
-                                          * function tries to acquire
-                                          * it, but that will fail
-                                          * unless the task has
-                                          * unlocked it, which it does
-                                          * upon completion.
-                                          */
-        mutable ThreadMutex completion_mutex;
-
       public:
 
 					 /**
@@ -3894,11 +3867,6 @@
     void
     TaskDescriptor<RT>::queue_task ()
     {
-				       // lock the mutex. it will
-				       // become unlocked when the
-				       // task is done
-      completion_mutex.acquire ();
-
 				       // use the pattern described in
 				       // the TBB book on pages
 				       // 230/231 ("Start a large task
@@ -3958,7 +3926,6 @@
 				       // explicitly destroy the empty
 				       // task object
       Assert (task != 0, ExcInternalError());
-      task->wait_for_all ();
       task->destroy (*task);
     }
 
@@ -3969,21 +3936,19 @@
     TaskDescriptor<RT>::join ()
     {
                                        // use Schmidt's double checking
-                                       // pattern: if thread has already
+                                       // pattern: if task has already
                                        // indicated that it is done, then
                                        // return immediately
       if (task_is_done)
 	return;
 
-				       // acquire the lock; this can
-				       // only succeed when the task
-				       // is done
-      completion_mutex.acquire ();
+				       // let TBB wait for the task to
+				       // complete. TBB will make sure that it
+				       // ends properly.
+      task->wait_for_all();
 
-				       // release it again; at this
-				       // point the task must have
+				       // at this point the task must have
 				       // finished
-      completion_mutex.release ();
       Assert (task_is_done == true, ExcInternalError());
     }
 
_______________________________________________
dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii

Reply via email to