On Fri, Feb 9, 2018 at 4:25 PM, Szilárd Páll <pall.szil...@gmail.com> wrote:

> Hi,
>
> First of all,have you read the docs (admittedly somewhat brief):
> http://manual.gromacs.org/documentation/2018/user-guide/
> mdrun-performance.html#types-of-gpu-tasks
>
> The current PME GPU was optimized for single-GPU runs. Using multiple GPUs
> with PME offloaded works, but this mode hasn't been an optimization target
> and it will often not give very good performance. Using multiple GPUs
> requires a separate PME rank (as you have realized), only one can be used
> (as we don't support PME decomposition on GPUs) and it comes some
> inherent scaling drawbacks. For this reason, unless you _need_ your single
> run to be as fast as possible, you'll be better off running multiple
> simulations side-by side.
>

PS: You can of course also run on two GPUs and run two simulations
side-by-side (on half of the cores for each) to improve the overall
aggregate throughput you get out of the hardware.


>
> A few tips for tuning the performance of a multi-GPU run with PME offload:
> * expect to get at best 1.5 scaling to 2 GPUs (rarely 3 if the tasks allow)
> * generally it's best to use about the same decomposition that you'd use
> with nonbonded-only offload, e.g. in your case 6-8 ranks
> * map the GPU task alone or at most together with 1 PP rank to a GPU, i.e.
> use the new -gputasks option
> e.g. for your case I'd expect the following to work ~best:
> gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1
> -gputasks 00000001
> or
> gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 8 -ntomp 6 -npme 1
> -gputasks 00000011
>
>
> Let me know if that gave some improvement.
>
> Cheers,
>
> --
> Szilárd
>
> On Fri, Feb 9, 2018 at 8:51 AM, Gmx QA <gmxquesti...@gmail.com> wrote:
>
>> Hi list,
>>
>> I am trying out the new gromacs 2018 (really nice so far), but have a few
>> questions about what command line options I should specify, specifically
>> with the new gnu pme implementation.
>>
>> My computer has two CPUs (with 12 cores each, 24 with hyper threading) and
>> two GPUs, and I currently (with 2018) start simulations like this:
>>
>> $ gmx mdrun -v -deffnm md -pme gpu -nb gpu -ntmpi 2 -npme 1 -ntomp 24
>> -gpu_id 01
>>
>> this works, but gromacs prints the message that 24 omp threads per mpi
>> rank
>> is likely inefficient. However, trying to reduce the number of omp threads
>> I see a reduction in performance. Is this message no longer relevant with
>> gpu pme or am I overlooking something?
>>
>> Thanks
>> /PK
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at http://www.gromacs.org/Support
>> /Mailing_Lists/GMX-Users_List before posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-requ...@gromacs.org.
>>
>
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to