Hi Wolfgang, Thanks for your answer. You were dead right about the threaded BLAS. I am using Intel's MKL and it is threaded with OMP, which it turns out is every bit as eager as TBB to spawn N_PROCs threads. I have added environment variables export OMP_NUM_THREADS=N to control the number of threads that the BLAS library tries to use. (NB due to the program flow, TBB threads are idle when the OMP threads are busy and visa versa hence it seems right to set both threads to the maximum you want busy at any time. TBB can be one higher because the main thread is idle.) Thanks for the pointer.
Regarding a final 'fix' for the number of TBB threads: The current situation is that TBB chooses how many threads to spawn, which is what the makers recommend. This is ideal for people running deal.II on private multicore machines and even people who want to run on a cluster, with a single job per node. The problem is occurring on large SMP machines which I presume is a minority of users, and explicitly setting the number of threads is a usable work around. I think we need to be careful about putting in an explicit call to set the number of threads to use since most users won't need it and it could be detrimental in some cases (especially if TBB comes up with smart algorithms for choosing how many threads to start). Perhaps the best approach is to document how to control the number of threads clearly somewhere where it will be found quickly by anyone who sees the problem, and file a bug / feature request for an environment variable like OMP_NUM_THREADS in future versions of TBB? Cheers, Michael On Tue, Jun 1, 2010 at 9:00 PM, Wolfgang Bangerth <[email protected]> wrote: > > Michael, > >> Your summary of TBB sounds like what I have seen in the various pieces >> of documentation also. Yes, I placed the snippet of code in the main >> function but this was overly cautious, it simply has to be before the >> first time a task is used in the code. Currently that seems to be in >> managing the DoFs (the ConstraintMatrix class) although this may be >> different for different parts of the library and may change as the >> library grows. It doesn't sound like changing the number of threads is >> an option, but initializing the threads could wait. Perhaps a deal.II >> method that hides a call to initialize the right number of TBB threads >> and uses an Assert statement to throw an exception if the TBB threads >> are already initialized could work? > > We could put a call to a function like the following in front of pretty > much every call to TBB functions: > > void initialize_tbb () { > static tbb::task_scheduler_init > dummy (multithread_info.n_default_threads); > } > > The initialization would happen only the first time around we call the > function, but it would presumably be late enough so that if one were to > set multithread_info.n_default_threads in main(), it gets picked up by the > call. Calling this function in all places where we use the TBB would be > annoying but probably not overly burdensome. There are currently only > three files where we do anything with the TBB anyway: > base/include/base/{parallel,thread_management,work_stream}.h > > What would you think of this option? > > >> One weird thing is that on SMP machines you tend to request resources >> in terms of CPUs but use resources with threads. If you request 4 CPUs >> and spawn only 4, possibly 5, threads then it is likely that the 4 >> CPUs are under utilized, especially if some of the threads are writing >> to the disk. If you spawn more threads presumably they will be run on >> more CPUs at some stage. I suppose this is where spawning a regular >> thread to handle heavy disk usage, leaving the TBB threads to handle >> computations, comes in. > > Yes, this is why we still have both Threads::new_thread and > Threads::new_task. Threads are for things that can share CPUs, tasks for > things that are best run start-to-finish. > > >> Do you have any idea why the SparseDirectUMFPACK solver might somehow >> run outside of the scope of the main function? (N.B. scope is possibly >> another reason to initialize the TBB threads in main.) I believe that >> it is a serial solver, but somewhere in the solver a stray call to TBB >> tasks_scheduler_init seems to be occurring. > > I have not the slightest idea. I don't think UMFPACK uses the TBB itself. > Are you using a version of BLAS that does things in parallel? > > W. > > ------------------------------------------------------------------------- > Wolfgang Bangerth email: [email protected] > www: http://www.math.tamu.edu/~bangerth/ > > _______________________________________________ dealii mailing list http://poisson.dealii.org/mailman/listinfo/dealii
