Thank you, Mark, for the prompt response. I realize the limitations of the system ( its over 8 yo ), but I did not expect the speed to decrease by 50% with 12 available threads ! No combination of ntomp, ntmpi could raise ns/day above 4 with two GPU, vs 6 with one GPU.
This is actually a learning/practice run for a new build - an AMD 4.2 Ghz 32 core TR, 64G ram. In this case I am trying to decide upon either a RTX 2080 ti or two GTX 1080 TI. I'd prefer the two 1080's for the 7000 cores vs the 4500 cores of the 2080. The model systems will have ~ million particles and need the speed. But this is a major expense so I need to get it right. I'll do as you suggest and report the results for both systems and I really appreciate the assist. Paul UMN, BICB On Dec 9 2018, at 4:32 pm, paul buscemi <pbusc...@q.com> wrote: > > Dear Users, > I have good luck using a single GPU with the basic setup.. However in going > from one gtx 1060 to a system with two - 50,000 atoms - the rate decrease > from 10 ns/day to 5 or worse. The system models a ligand, solvent ( water ) > and a lipid membrane > the cpu is a 6 core intel i7 970( 12 threads ) , 750W PS, 16G Ram. > with the basic command " mdrun I get: > ck Off! I just backed up sys.nvt.log to ./#.sys.nvt.log.10# > Reading file SR.sys.nvt.tpr, VERSION 2018.3 (single precision) > Changing nstlist from 10 to 100, rlist from 1 to 1 > > Using 2 MPI threads > Using 6 OpenMP threads per tMPI thread > > On host I7 2 GPUs auto-selected for this run. > Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node: > PP:0,PP:1 > > Back Off! I just backed up SR.sys.nvt.trr to ./#SR.sys.nvt.trr.10# > Back Off! I just backed up SR.sys.nvt.edr to ./#SR.sys.nvt.edr.10# > NOTE: DLB will not turn on during the first phase of PME tuning > starting mdrun 'SR-TA' > 100000 steps, 100.0 ps. > and ending with ^C > > Received the INT signal, stopping within 200 steps > > Dynamic load balancing report: > DLB was locked at the end of the run due to unfinished PP-PME balancing. > Average load imbalance: 0.7%. > The balanceable part of the MD step is 46%, load imbalance is computed from > this. > Part of the total run time spent waiting due to load imbalance: 0.3%. > > > Core t (s) Wall t (s) (%) > Time: 543.475 45.290 1200.0 > (ns/day) (hour/ns) > Performance: 1.719 13.963 before DBL is turned on > > Very poor performance. I have been following - or trying to follow - > "Performance Tuning and Optimization fo GROMACA ' M.Abraham andR Apsotolov - > 2016 but have not yet broken the code. > ---------------- > gmx mdrun -deffnm SR.sys.nvt -ntmpi 2 -ntomp 3 -gpu_id 01 -pin on. > > > Back Off! I just backed up SR.sys.nvt.log to ./#SR.sys.nvt.log.13# > Reading file SR.sys.nvt.tpr, VERSION 2018.3 (single precision) > Changing nstlist from 10 to 100, rlist from 1 to 1 > > Using 2 MPI threads > Using 3 OpenMP threads per tMPI thread > > On host I7 2 GPUs auto-selected for this run. > Mapping of GPU IDs to the 2 GPU tasks in the 2 ranks on this node: > PP:0,PP:1 > > Back Off! I just backed up SR.sys.nvt.trr to ./#SR.sys.nvt.trr.13# > Back Off! I just backed up SR.sys.nvt.edr to ./#SR.sys.nvt.edr.13# > NOTE: DLB will not turn on during the first phase of PME tuning > starting mdrun 'SR-TA' > 100000 steps, 100.0 ps. > > NOTE: DLB can now turn on, when beneficial > ^C > > Received the INT signal, stopping within 200 steps > > Dynamic load balancing report: > DLB was off during the run due to low measured imbalance. > Average load imbalance: 0.7%. > The balanceable part of the MD step is 46%, load imbalance is computed from > this. > Part of the total run time spent waiting due to load imbalance: 0.3%. > > > Core t (s) Wall t (s) (%) > Time: 953.837 158.973 600.0 > (ns/day) (hour/ns) > Performance: 2.935 8.176 > > ==================== > the beginning of the log file is > GROMACS version: 2018.3 > Precision: single > Memory model: 64 bit > MPI library: thread_mpi > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) > GPU support: CUDA > SIMD instructions: SSE4.1 > FFT library: fftw-3.3.8-sse2 > RDTSCP usage: enabled > TNG support: enabled > Hwloc support: disabled > Tracing support: disabled > Built on: 2018-10-19 21:26:38 > Built by: pb@Q4 [CMAKE] > Build OS/arch: Linux 4.15.0-20-generic x86_64 > Build CPU vendor: Intel > Build CPU brand: Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz > Build CPU family: 6 Model: 44 Stepping: 2 > Build CPU features: aes apic clfsh cmov cx8 cx16 htt intel lahf mmx msr > nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 > sse4.2 ssse3 > C compiler: /usr/bin/gcc-6 GNU 6.4.0 > C compiler flags: -msse4.1 -O3 -DNDEBUG -funroll-all-loops > -fexcess-precision=fast > C++ compiler: /usr/bin/g++-6 GNU 6.4.0 > C++ compiler flags: -msse4.1 -std=c++11 -O3 -DNDEBUG -funroll-all-loops > -fexcess-precision=fast > CUDA compiler: /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright > (c) 2005-2017 NVIDIA Corporation;Built on Fri_Nov__3_21:07:56_CDT_2017;Cuda > compilation tools, release 9.1, V9.1.85 > CUDA compiler > flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70;-use_fast_math;-D_FORCE_INLINES;; > ;-msse4.1;-std=c++11;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast; > CUDA driver: 9.10 > CUDA runtime: 9.10 > > > Running on 1 node with total 12 cores, 12 logical cores, 2 compatible GPUs > Hardware detected: > CPU info: > Vendor: Intel > Brand: Intel(R) Core(TM) i7 CPU 970 @ 3.20GHz > Family: 6 Model: 44 Stepping: 2 > Features: aes apic clfsh cmov cx8 cx16 htt intel lahf mmx msr nonstop_tsc > pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 > Hardware topology: Only logical processor count > GPU info: > Number of GPUs detected: 2 > #0: NVIDIA GeForce GTX 1060 6GB, compute cap.: 6.1, ECC: no, stat: compatible > #1: NVIDIA GeForce GTX 1060 6GB, compute cap.: 6.1, ECC: no, stat: compatible > > > There were no errors encountered during the runs. Suggestions would be > appreciated. > Regards > Paul > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.