Re: [gmx-users] Running Gromacs in parallel

jkrieger Thu, 22 Sep 2016 09:40:28 -0700

Thanks again

> On Wed, Sep 21, 2016 at 9:55 PM,  <jkrie...@mrc-lmb.cam.ac.uk> wrote:
>> Thanks Sz.
>>
>> Do you think going up to from version 5.0.4 to 5.1.4 would really make
>> such a big difference?
>
> Note that I was recommending using a modern compiler + the latest
> release (which is called 2016 not 5.1.4!). It's hard to guess the
> improvements, but from 5.0->2016 you should see double-digit
> percentage improvements and going from gcc 4.4 to 5.x or 6.0 wil also
> have a significant improvement.


I was still thinking of 2016 as too new to be used for simulations I might
want to publish. I will try it when I can then.

>
>> Here is a log file from a single md run (that has finished unlike the
>> metadynamics) with the number of OpenMP threads matching how many
>> threads
>> there are on each node. This has been restarted a number of times with
>> different launch configurations being mostly the number of nodes and the
>> node type (either 8 CPUs or 24 CPUs).
>> https://www.dropbox.com/s/uxzsj3pm31n66nz/md.log?dl=0
>
> You seem to be using a single MPI rank per node ion these runs. That
> will almost never be optimal, especially not when DD is not limited.

Yes, I only realised that recently and I thought it might be useful to see
this log seeing as it is a complete run and has the bit at the bottom.
Here is a multiple walker metadynamics log, includes some other
combinations I tried.

https://www.dropbox.com/s/td7ps45dzz1otwz/from_cluster_metad0.log?dl=0

>
>> From timesteps when checkpoints were written I can see that these
>> configurations make quite a difference and per CPU, having 8 OpenMP
>> threads per MPI process becomes a much worse idea stepping from 4 nodes
>> to
>> 6 nodes, i.e. having more CPUs makes mixed paralellism less favourable
>> as
>> suggested in figure 8. Yes, the best may not lie at 1 OpenMP thread per
>> MPI rank and may vary depending on the number of CPUs as well.
>
> Sure, but 8 threads panning over two sockets will definitely be
> suboptimal. Start with trying fewer and consider using separate PME
> ranks especially if you have ethernet.

ok

>
>> Also, I can
>> see that for the same number of CPUs, the 24-thread nodes are better
>> than
>> the 8-thread nodes but I can't get so many of them as they are also more
>> popular for RELION users.
>
> FYI those are 2x6-core CPUs with Hyperthreading, so 2x12 hardware
> threads. Also the two generations newer, so it's not surprising that
> they are much faster. Still, 24 threads/node is too much. Use less.
>
>> What can I infer from the information at the
>> end?
>
> Before starting to interpret that, it's worth fixing the above issues ;)
> Otherwise, what's clear is that PME is taking a considerable amount of
> time, especially given the long cut-off.
>
> Cheers,
> --
> SzilÃ¡rd
>
>
>>
>> Best wishes
>> James
>>
>>> Hi,
>>>
>>> On Wed, Sep 21, 2016 at 5:44 PM,  <jkrie...@mrc-lmb.cam.ac.uk> wrote:
>>>> Hi SzilÃ¡rd,
>>>>
>>>> Yes I had looked at it but not with our cluster in mind. I now have a
>>>> couple of GPU systems (both have an 8-core i7-4790K CPU with one Titan
>>>> X
>>>> GPU on one system and two Titan X GPUs on the other), and have been
>>>> thinking about about getting the most out of them. I listened to
>>>> Carsten's
>>>> BioExcel webinar this morning and it got me thinking about the cluster
>>>> as
>>>> well. I've just had a quick look now and it suggests Nrank = Nc and
>>>> Nth
>>>> =
>>>> 1 for high core count, which I think worked slightly less well for me
>>>> but
>>>> I can't find the details so I may be remembering wrong.
>>>
>>> That's not unexpected, the reported values are specific to the
>>> hardware and benchmark systems and only give a rough idea where the
>>> ranks/threads balance should be.
>>>>
>>>> I don't have log files from a systematic benchmark of our cluster as
>>>> it
>>>> isn't really available enough for doing that.
>>>
>>> That's not really necessary, even logs from a single production run
>>> can hint possible improvements.
>>>
>>>> I haven't tried gmx tune_pme
>>>> on there either. I do have node-specific installations of
>>>> gromacs-5.0.4
>>>> but I think they were done with gcc-4.4.7 so there's room for
>>>> improvement
>>>> there.
>>>
>>> If that's the case, I'd simply recommend using a modern compiler and
>>> if you can a recent GROMACS version, you'll gain more performance than
>>> from most launch config tuning.
>>>
>>>> The cluster nodes I have been using have the following cpu specs
>>>> and 10Gb networking. It could be that using 2 OpenMP threads per MPI
>>>> rank
>>>> works nicely because it matches the CPU configuration and makes better
>>>> use
>>>> of hyperthreading.
>>>
>>> Or because of the network. Or for some other reason. Again, comparing
>>> the runs' log files could tell more :)
>>>
>>>> Architecture:          x86_64
>>>> CPU op-mode(s):        32-bit, 64-bit
>>>> Byte Order:            Little Endian
>>>> CPU(s):                8
>>>> On-line CPU(s) list:   0-7
>>>> Thread(s) per core:    2
>>>> Core(s) per socket:    2
>>>> Socket(s):             2
>>>> NUMA node(s):          2
>>>> Vendor ID:             GenuineIntel
>>>> CPU family:            6
>>>> Model:                 26
>>>> Model name:            Intel(R) Xeon(R) CPU           E5530  @ 2.40GHz
>>>> Stepping:              5
>>>> CPU MHz:               2393.791
>>>> BogoMIPS:              4787.24
>>>> Virtualization:        VT-x
>>>> L1d cache:             32K
>>>> L1i cache:             32K
>>>> L2 cache:              256K
>>>> L3 cache:              8192K
>>>> NUMA node0 CPU(s):     0,2,4,6
>>>> NUMA node1 CPU(s):     1,3,5,7
>>>>
>>>> I appreciate that a lot is system-dependent and that I can't really
>>>> help
>>>> you help me very much. It also should be noted that my multi runs are
>>>> multiple walker metadynamics run and are slowing down because there
>>>> are
>>>> large bias potentials in memory that need to be communicated around
>>>> too.
>>>> As I said I haven't had a chance to make separate benchmark runs but
>>>> have
>>>> just made observations based upon existing runs.
>>>
>>> Understandable, I was just giving tips and hints.
>>>
>>> Cheers,
>>> --
>>> Sz.
>>>
>>>
>>>> Best wishes
>>>> James
>>>>
>>>>> Performance tuning is highly dependent on the simulation system and
>>>>> the hardware you're running on. Questions like the ones you pose are
>>>>> impossible to answer meaningfully without *full* log files (and
>>>>> hardware specs including network).
>>>>>
>>>>> Have you checked the performance checklist I linked above?
>>>>> --
>>>>> SzilÃÄrd
>>>>>
>>>>>
>>>>> On Wed, Sep 21, 2016 at 11:36 AM,  <jkrie...@mrc-lmb.cam.ac.uk>
>>>>> wrote:
>>>>>> I wonder whether what I see that -np 108 and -ntomp 2 is best comes
>>>>>> from
>>>>>> using -multi 6 with 8-CPU nodes. That level of parallelism may then
>>>>>> be
>>>>>> necessary to trigger automatic segregation of PP and PME ranks. I'm
>>>>>> not
>>>>>> sure if I tried -np 54 and -ntomp 4, which would probably also do
>>>>>> it.
>>>>>> I
>>>>>> compared mostly on 196 CPUs then found going up to 216 was better
>>>>>> than
>>>>>> 196
>>>>>> with -ntomp 2 and pure MPI (-ntomp 1) was considerably worse for
>>>>>> both.
>>>>>> Would people recommend to go back to 196 which allows 4 whole nodes
>>>>>> per
>>>>>> replica and playing with -npme and -ntomp_pme?
>>>>>>
>>>>>>> Hi Thanh Le,
>>>>>>>
>>>>>>> Assuming all the nodes are the same (9 nodes with 12 CPUs) then you
>>>>>>> could
>>>>>>> try the following
>>>>>>>
>>>>>>> mpirun -np 9 --map-by node mdrun -ntomp 12 ...
>>>>>>> mpirun -np 18 mdrun -ntomp 6 ...
>>>>>>> mpirun -np 54 mdrun -ntomp 2 ...
>>>>>>>
>>>>>>> Which of these works best will depend on your setup.
>>>>>>>
>>>>>>> Using the whole cluster for one job may not be the most efficient
>>>>>>> way.
>>>>>>> I
>>>>>>> found on our cluster that once I reach 216 CPUs (equivalent
>>>>>>> settings
>>>>>>> from
>>>>>>> the queuing system to -np 108 and -ntomp 2), I can't do better by
>>>>>>> adding
>>>>>>> more nodes (where presumably communication becomes an issue). In
>>>>>>> addition
>>>>>>> to running -multi or -multidir jobs, which takes the load off
>>>>>>> communication a bit, it may also be worth having separate jobs and
>>>>>>> using
>>>>>>> -pin on and -pinoffset.
>>>>>>>
>>>>>>> Best wishes
>>>>>>> James
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>> I have a question concerning running gromacs in parallel. I have
>>>>>>>> read
>>>>>>>> over
>>>>>>>> the
>>>>>>>> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html
>>>>>>>> <http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html>
>>>>>>>> but I still dont quite understand how to run it efficiently.
>>>>>>>> My gromacs version is 4.5.4
>>>>>>>> The cluster I am using has CPUs total: 108 and 4 hosts up.
>>>>>>>> The node iam using:
>>>>>>>> Architecture:          x86_64
>>>>>>>> CPU op-mode(s):        32-bit, 64-bit
>>>>>>>> Byte Order:            Little Endian
>>>>>>>> CPU(s):                12
>>>>>>>> On-line CPU(s) list:   0-11
>>>>>>>> Thread(s) per core:    2
>>>>>>>> Core(s) per socket:    6
>>>>>>>> Socket(s):             1
>>>>>>>> NUMA node(s):          1
>>>>>>>> Vendor ID:             AuthenticAMD
>>>>>>>> CPU family:            21
>>>>>>>> Model:                 2
>>>>>>>> Stepping:              0
>>>>>>>> CPU MHz:               1400.000
>>>>>>>> BogoMIPS:              5200.57
>>>>>>>> Virtualization:        AMD-V
>>>>>>>> L1d cache:             16K
>>>>>>>> L1i cache:             64K
>>>>>>>> L2 cache:              2048K
>>>>>>>> L3 cache:              6144K
>>>>>>>> NUMA node0 CPU(s):     0-11
>>>>>>>> MPI is already installed. I also have permission to use the
>>>>>>>> cluster
>>>>>>>> as
>>>>>>>> much as I can.
>>>>>>>> My question is: how should I write my mdrun command run to utilize
>>>>>>>> all
>>>>>>>> the
>>>>>>>> possible cores and nodes?
>>>>>>>> Thanks,
>>>>>>>> Thanh Le
>>>>>>>> --
>>>>>>>> Gromacs Users mailing list
>>>>>>>>
>>>>>>>> * Please search the archive at
>>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>>>> posting!
>>>>>>>>
>>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>>>
>>>>>>>> * For (un)subscribe requests visit
>>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>>>> or
>>>>>>>> send
>>>>>>>> a mail to gmx-users-requ...@gromacs.org.
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gromacs Users mailing list
>>>>>>
>>>>>> * Please search the archive at
>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>>> posting!
>>>>>>
>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>>
>>>>>> * For (un)subscribe requests visit
>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users
>>>>>> or
>>>>>> send a mail to gmx-users-requ...@gromacs.org.
>>>>> --
>>>>> Gromacs Users mailing list
>>>>>
>>>>> * Please search the archive at
>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>>> posting!
>>>>>
>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>>
>>>>> * For (un)subscribe requests visit
>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>>> send
>>>>> a mail to gmx-users-requ...@gromacs.org.
>>>>
>>>>
>>>>
>>>> --
>>>> Gromacs Users mailing list
>>>>
>>>> * Please search the archive at
>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>>> posting!
>>>>
>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>>
>>>> * For (un)subscribe requests visit
>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>>> send a mail to gmx-users-requ...@gromacs.org.
>>> --
>>> Gromacs Users mailing list
>>>
>>> * Please search the archive at
>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>>> posting!
>>>
>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>>
>>> * For (un)subscribe requests visit
>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>>> send
>>> a mail to gmx-users-requ...@gromacs.org.
>>
>>
>>
>> --
>> Gromacs Users mailing list
>>
>> * Please search the archive at
>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
>> posting!
>>
>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>>
>> * For (un)subscribe requests visit
>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
>> send a mail to gmx-users-requ...@gromacs.org.
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send
> a mail to gmx-users-requ...@gromacs.org.

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] Running Gromacs in parallel

Reply via email to