I've gone to conclusion that simulation with 1 or 2 GPU simultaneously gave me the same performance mdrun -ntmpi 2 -ntomp 6 -gpu_id 01 -v -deffnm md_CaM_test,
mdrun -ntmpi 2 -ntomp 6 -gpu_id 0 -v -deffnm md_CaM_test, Doest it be due to the small CPU cores or addition RAM ( this system has 32 gb) is needed ? OR may be some extra options are needed in the config? James 2013/11/6 Richard Broadbent <richard.broadben...@imperial.ac.uk> > Hi Dwey, > > > On 05/11/13 22:00, Dwey Kauffman wrote: > >> Hi Szilard, >> >> Thanks for your suggestions. I am indeed aware of this page. In a >> 8-core >> AMD with 1GPU, I am very happy about its performance. See below. My >> intention is to obtain a even better one because we have multiple nodes. >> >> ### 8 core AMD with 1 GPU, >> Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554 >> For optimal performance this ratio should be close to 1! >> >> >> NOTE: The GPU has >20% more load than the CPU. This imbalance causes >> performance loss, consider using a shorter cut-off and a finer PME >> grid. >> >> Core t (s) Wall t (s) (%) >> Time: 216205.510 27036.812 799.7 >> 7h30:36 >> (ns/day) (hour/ns) >> Performance: 31.956 0.751 >> >> ### 8 core AMD with 2 GPUs >> >> Core t (s) Wall t (s) (%) >> Time: 178961.450 22398.880 799.0 >> 6h13:18 >> (ns/day) (hour/ns) >> Performance: 38.573 0.622 >> Finished mdrun on node 0 Sat Jul 13 09:24:39 2013 >> >> > I'm almost certain that Szilard meant the lines above this that give the > breakdown of where the time is spent in the simulation. > > Richard > > >> However, in your case I suspect that the >>> bottleneck is multi-threaded scaling on the AMD CPUs and you should >>> probably decrease the number of threads per MPI rank and share GPUs >>> between 2-4 ranks. >>> >> >> >> OK but can you give a example of mdrun command ? given a 8 core AMD with 2 >> GPUs. >> I will try to run it again. >> >> >> Regarding scaling across nodes, you can't expect much from gigabit >>> ethernet - especially not from the cheaper cards/switches, in my >>> experience even reaction field runs don't scale across nodes with 10G >>> ethernet if you have more than 4-6 ranks per node trying to >>> communicate (let alone with PME). However, on infiniband clusters we >>> have seen scaling to 100 atoms/core (at peak). >>> >> >> From your comments, it sounds like a cluster of AMD cpus is difficult to >>> >> scale across nodes in our current setup. >> >> Let's assume we install Infiniband (20 or 40GB/s) in the same system of 16 >> nodes of 8 core AMD with 1 GPU only. Considering the same AMD system, what >> is a good way to obtain better performance when we run a task across >> nodes >> ? in other words, what dose mudrun_mpi look like ? >> >> Thanks, >> Dwey >> >> >> >> >> -- >> View this message in context: http://gromacs.5086.x6.nabble. >> com/Gromacs-4-6-on-two-Titans-GPUs-tp5012186p5012279.html >> Sent from the GROMACS Users Forum mailing list archive at Nabble.com. >> >> -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at http://www.gromacs.org/ > Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the www > interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists