[gmx-users] Re: Gromacs-4.6 on two Titans GPUs

Dwey Kauffman Tue, 05 Nov 2013 14:02:08 -0800

Hi Szilard,

   Thanks for your suggestions. I am  indeed aware of this page. In a 8-core
AMD with 1GPU, I am very happy about its performance. See below. My
intention is to obtain a even better one because we have multiple nodes.


### 8 core AMD with  1 GPU,
Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
For optimal performance this ratio should be close to 1!


NOTE: The GPU has >20% more load than the CPU. This imbalance causes
      performance loss, consider using a shorter cut-off and a finer PME
grid.

               Core t (s)   Wall t (s)        (%)
       Time:   216205.510    27036.812      799.7
                         7h30:36
                 (ns/day)    (hour/ns)
Performance:       31.956        0.751

### 8 core AMD with 2 GPUs

               Core t (s)   Wall t (s)        (%)
       Time:   178961.450    22398.880      799.0
                         6h13:18
                 (ns/day)    (hour/ns)
Performance:       38.573        0.622
Finished mdrun on node 0 Sat Jul 13 09:24:39 2013


>However, in your case I suspect that the 
>bottleneck is multi-threaded scaling on the AMD CPUs and you should 
>probably decrease the number of threads per MPI rank and share GPUs 
>between 2-4 ranks.


OK but can you give a example of mdrun command ? given a 8 core AMD with 2
GPUs.
I will try to run it again.


>Regarding scaling across nodes, you can't expect much from gigabit 
>ethernet - especially not from the cheaper cards/switches, in my 
>experience even reaction field runs don't scale across nodes with 10G 
>ethernet if you have more than 4-6 ranks per node trying to 
>communicate (let alone with PME). However, on infiniband clusters we 
>have seen scaling to 100 atoms/core (at peak). 

>From your comments, it sounds like a cluster of AMD cpus is difficult to
scale across nodes in our current setup.

Let's assume we install Infiniband (20 or 40GB/s) in the same system of 16
nodes of 8 core AMD with 1 GPU only. Considering the same AMD system, what
is a good way to obtain better performance  when we run a task across nodes
? in other words, what dose mudrun_mpi look like ?

Thanks,
Dwey


    

--
View this message in context: 
http://gromacs.5086.x6.nabble.com/Gromacs-4-6-on-two-Titans-GPUs-tp5012186p5012279.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

[gmx-users] Re: Gromacs-4.6 on two Titans GPUs

Reply via email to