On Fri, Feb 9, 2018 at 4:25 PM, Szilárd Páll <pall.szil...@gmail.com> wrote:
> Hi, > > First of all,have you read the docs (admittedly somewhat brief): > http://manual.gromacs.org/documentation/2018/user-guide/ > mdrun-performance.html#types-of-gpu-tasks > > The current PME GPU was optimized for single-GPU runs. Using multiple GPUs > with PME offloaded works, but this mode hasn't been an optimization target > and it will often not give very good performance. Using multiple GPUs > requires a separate PME rank (as you have realized), only one can be used > (as we don't support PME decomposition on GPUs) and it comes some > inherent scaling drawbacks. For this reason, unless you _need_ your single > run to be as fast as possible, you'll be better off running multiple > simulations side-by side. > PS: You can of course also run on two GPUs and run two simulations side-by-side (on half of the cores for each) to improve the overall aggregate throughput you get out of the hardware. > > A few tips for tuning the performance of a multi-GPU run with PME offload: > * expect to get at best 1.5 scaling to 2 GPUs (rarely 3 if the tasks allow) > * generally it's best to use about the same decomposition that you'd use > with nonbonded-only offload, e.g. in your case 6-8 ranks > * map the GPU task alone or at most together with 1 PP rank to a GPU, i.e. > use the new -gputasks option > e.g. for your case I'd expect the following to work ~best: > gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1 > -gputasks 00000001 > or > gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1 > -gputasks 00000011 > > > Let me know if that gave some improvement. > > Cheers, > > -- > Szilárd > > On Fri, Feb 9, 2018 at 8:51 AM, Gmx QA <gmxquesti...@gmail.com> wrote: > >> Hi list, >> >> I am trying out the new gromacs 2018 (really nice so far), but have a few >> questions about what command line options I should specify, specifically >> with the new gnu pme implementation. >> >> My computer has two CPUs (with 12 cores each, 24 with hyper threading) and >> two GPUs, and I currently (with 2018) start simulations like this: >> >> $ gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 2 -npme 1 -ntomp 24 >> -gpu_id 01 >> >> this works, but gromacs prints the message that 24 omp threads per mpi >> rank >> is likely inefficient. However, trying to reduce the number of omp threads >> I see a reduction in performance. Is this message no longer relevant with >> gpu pme or am I overlooking something? >> >> Thanks >> /PK >> -- >> Gromacs Users mailing list >> >> * Please search the archive at http://www.gromacs.org/Support >> /Mailing_Lists/GMX-Users_List before posting! >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> * For (un)subscribe requests visit >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >> send a mail to gmx-users-requ...@gromacs.org. >> > > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.