Thanks Nick, > >Though Sylvain's original mail (*1) was sent 4 months ago and nobody > >replied to it, I'm interested in this issue and strongly agree with > >Sylvain. > > > > *1 http://www.open-mpi.org/community/lists/devel/2010/01/7275.php > > > >As explained by Sylvain, current Open MPI implementation always returns > >MPI_THREAD_SINGLE as provided thread level if neither --enable-mpi-threads > >nor --enable-progress-threads was specified at configure (v1.4). > > I can explain that, from an outside viewpoint. I can't tell you why > OpenMPI took that decision, but I can guess. > > That is definitely the correct action. Unless an application or library > has been built with thread support, or can guaranteed to be called only > from a single thread, using threads is catastrophic. And, regrettably, > given modern approaches to building software and the **** configure > design, configure is where the test has to go.
What "with thread support" means? It means configure --enable-mpi-threads ? As long as MPI library returns MPI_THREAD_FUNNELED for MPI_Init_thread and MPI application follows it, MPI functions are guaranteed to be called only from a single thread. I think it's enough for MPI_THREAD_FUNNELED. Of course, it's not enough for MPI_THREAD_MULTIPLE. Ah, "library" in your mail means libc or something other than MPI library? If so, it makes sense. Because MPI_THREAD_FUNNELED/SERIALIZED doesn't restrict other threads to call functions other than those of MPI library, code bellow are not thread safe if malloc is not thread safe and MPI_Allreduce calls malloc. #pragma omp parallel for private(is_master) { MPI_Is_thread_main(&is_master); if (is_master == 0) { /* master thread */ MPI_Allreduce(...); } else { /* other threads */ /* work that calls malloc */ } } > On some systems, there are certain actions that require thread affinity > (sometimes including I/O, and often undocumented). zOS is one, but I > have seen it under a few Unices, too. > > On others, they use a completely different (and seriously incompatible, > at both the syntactic and semantic levels) set of libraries. E.g. AIX. Sorry, I don't know these issue well. Do you mean the case I wrote above about malloc? > >If we use OpenMP with MPI, we need at least MPI_THREAD_FUNNELED even > >if MPI functions are called only outside of omp parallel region, > >like below. > > > > #pragma omp parallel for > > for (...) { > > /* computation */ > > } > > MPI_Allreduce(...); > > #pragma omp parallel for > > for (...) { > > /* computation */ > > } > > I don't think that's correct. That would call MPI_Allreduce once for > each thread, in parallel, on the same process - which wouldn't work. > I think that what you need is a primitive that OpenMP doesn't have (in > general), which is a GLOBAL_MASTER construct. What you have to do is: > > Each process finds its initial (system) thread id on entry. > You test the system thread and call MPI only if on that one. In C, omp parallel region ends with for-block. So I think that would call MPI_Allreduce once per process. # In Fortran, it may require omp end parallel directive to end parallel # region. But I don't know Fortran well, sorry. > >This means Open MPI users must specify --enable-mpi-threads or > >--enable-progress-threads to use OpenMP. Is it true? > >But this two configure options, i.e. OMPI_HAVE_THREAD_SUPPORT macro, > >lead to performance penalty by mutex lock/unlock. > > That's unavoidable, in general, with one niggle. If the programmer > guarantees BOTH to call MPI on the global master thread AND to ensure > that all memory is synchronised before it does so, there is no need > for mutexes. The MPI specification lacks some of the necessary > paranoia in this respect. > > >I believe OMPI_HAVE_THREADS (not OMPI_HAVE_THREAD_SUPPORT !) is sufficient > >to support MPI_THREAD_FUNNELED and MPI_THREAD_SERIALIZED, and therefore > >OMPI_HAVE_THREAD_SUPPORT should be OMPI_HAVE_THREADS at following > >part in ompi_mpi_init function, as suggested by Sylvain. > > I can't comment on that, though I doubt it's quite that simple. There's > a big difference between MPI_THREAD_FUNNELED and MPI_THREAD_SERIALIZED > in implementation impact. I can't imagine difference between those two, unless MPI library uses something thread local. Ah, there may be something on OSes that I don't know.... Anyway, thanks for your comment! Regards, Kawashima