On Sat, Mar 28, 2020, at 9:32 PM, Kutzner, Carsten wrote: > > > > Am 26.03.2020 um 17:00 schrieb Tobias Klöffel <tobias.kloef...@fau.de>: > > > > Hi Carsten, > > > > > > On 3/24/20 9:02 PM, Kutzner, Carsten wrote: > >> Hi, > >> > >>> Am 24.03.2020 um 16:28 schrieb Tobias Klöffel <tobias.kloef...@fau.de>: > >>> > >>> Dear all, > >>> I am very new to Gromacs so maybe some of my problems are very easy to > >>> fix:) > >>> Currently I am trying to compile and benchmark gromacs on AMD rome cpus, > >>> the benchmarks are taken from: > >>> https://www.mpibpc.mpg.de/grubmueller/bench > >>> > >>> 1) OpenMP parallelization: Is it done via OpenMP tasks? > >> Yes, all over the code loops are parallelized via OpenMP via #pragma omp > >> parallel for > >> and similar directives. > > Ok but that's not OpenMP tasking:) > >> > >>> If the Intel toolchain is detected and -DGMX_FFT_LIBRARY=mkl is > >>> set,-mkl=serial is used, even though -DGMX_OPENMP=on is set. > >> GROMACS uses only the serial transposes - allowing mkl to open up its own > >> OpenMP threads > >> would lead to oversubscription of cores and performance degradation. > > Ah I see. But then it should be noted somewhere in the docu that all > > FFTW/MKL calls are inside a parallel region. Is there a specific reason for > > this? Normally you can achieve much better performance if you call a > > threaded library outside of a parallel region and let the library use its > > own threads.
Creating and destroying threads can sometimes be slow, which is what threaded libraries do upon entry and exit. Thus if a progam is already using threads, it can be faster to have multiple threads call threadsafe versions of the serial library if this is what the library does - likely the case for FFTW. > >>> 2) I am trying to use gmx_mpi tune_pme but I never got it to run. I do > >>> not really understand what I have to specify for -mdrun. I > >> Normally you need a serial (read: non-mpi enabled) 'gmx' so that you can > >> call > >> gmx tune_pme. Most queueing systems don't like it if one parallel program > >> calls > >> another parallel program. > >> > >>> tried -mdrun 'gmx_mpi mdrun' and export MPIRUN="mpirun -use-hwthread-cpus > >>> -np $tmpi -map-by ppr:$tnode:node:pe=$OMP_NUM_THREADS --report-bindings" > >>> But it just complains that mdrun is not working. > >> There should be an output somewhere with the exact command line that > >> tune_pme invoked to test whether mdrun works. That should shed some light > >> on the issue. > >> > >> Side note: Tuning is normally only useful on CPU-nodes. If your nodes also > >> have GPUs, you will probably not want to do this kind of PME tuning. > > Yes it's CPU only... I will tune pp:ppme procs manually. However, > most of the times it is failing with 'too large prime number' what is > considered to be 'too large'? > I think 2, 3, 5, 7, 11, and 13 and multiples of these are ok, but not > larger prime numbers. > So for a fixed number of procs only some of the combinations PP:PME > will actually work. > The ones that don't work would not be wise to choose from a performance > point of view. > > Best, > Carsten > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.