Thanks again > On Wed, Sep 21, 2016 at 9:55 PM, <jkrie...@mrc-lmb.cam.ac.uk> wrote: >> Thanks Sz. >> >> Do you think going up to from version 5.0.4 to 5.1.4 would really make >> such a big difference? > > Note that I was recommending using a modern compiler + the latest > release (which is called 2016 not 5.1.4!). It's hard to guess the > improvements, but from 5.0->2016 you should see double-digit > percentage improvements and going from gcc 4.4 to 5.x or 6.0 wil also > have a significant improvement.
I was still thinking of 2016 as too new to be used for simulations I might want to publish. I will try it when I can then. > >> Here is a log file from a single md run (that has finished unlike the >> metadynamics) with the number of OpenMP threads matching how many >> threads >> there are on each node. This has been restarted a number of times with >> different launch configurations being mostly the number of nodes and the >> node type (either 8 CPUs or 24 CPUs). >> https://www.dropbox.com/s/uxzsj3pm31n66nz/md.log?dl=0 > > You seem to be using a single MPI rank per node ion these runs. That > will almost never be optimal, especially not when DD is not limited. Yes, I only realised that recently and I thought it might be useful to see this log seeing as it is a complete run and has the bit at the bottom. Here is a multiple walker metadynamics log, includes some other combinations I tried. https://www.dropbox.com/s/td7ps45dzz1otwz/from_cluster_metad0.log?dl=0 > >> From timesteps when checkpoints were written I can see that these >> configurations make quite a difference and per CPU, having 8 OpenMP >> threads per MPI process becomes a much worse idea stepping from 4 nodes >> to >> 6 nodes, i.e. having more CPUs makes mixed paralellism less favourable >> as >> suggested in figure 8. Yes, the best may not lie at 1 OpenMP thread per >> MPI rank and may vary depending on the number of CPUs as well. > > Sure, but 8 threads panning over two sockets will definitely be > suboptimal. Start with trying fewer and consider using separate PME > ranks especially if you have ethernet. ok > >> Also, I can >> see that for the same number of CPUs, the 24-thread nodes are better >> than >> the 8-thread nodes but I can't get so many of them as they are also more >> popular for RELION users. > > FYI those are 2x6-core CPUs with Hyperthreading, so 2x12 hardware > threads. Also the two generations newer, so it's not surprising that > they are much faster. Still, 24 threads/node is too much. Use less. > >> What can I infer from the information at the >> end? > > Before starting to interpret that, it's worth fixing the above issues ;) > Otherwise, what's clear is that PME is taking a considerable amount of > time, especially given the long cut-off. > > Cheers, > -- > Szilárd > > >> >> Best wishes >> James >> >>> Hi, >>> >>> On Wed, Sep 21, 2016 at 5:44 PM, <jkrie...@mrc-lmb.cam.ac.uk> wrote: >>>> Hi Szilárd, >>>> >>>> Yes I had looked at it but not with our cluster in mind. I now have a >>>> couple of GPU systems (both have an 8-core i7-4790K CPU with one Titan >>>> X >>>> GPU on one system and two Titan X GPUs on the other), and have been >>>> thinking about about getting the most out of them. I listened to >>>> Carsten's >>>> BioExcel webinar this morning and it got me thinking about the cluster >>>> as >>>> well. I've just had a quick look now and it suggests Nrank = Nc and >>>> Nth >>>> = >>>> 1 for high core count, which I think worked slightly less well for me >>>> but >>>> I can't find the details so I may be remembering wrong. >>> >>> That's not unexpected, the reported values are specific to the >>> hardware and benchmark systems and only give a rough idea where the >>> ranks/threads balance should be. >>>> >>>> I don't have log files from a systematic benchmark of our cluster as >>>> it >>>> isn't really available enough for doing that. >>> >>> That's not really necessary, even logs from a single production run >>> can hint possible improvements. >>> >>>> I haven't tried gmx tune_pme >>>> on there either. I do have node-specific installations of >>>> gromacs-5.0.4 >>>> but I think they were done with gcc-4.4.7 so there's room for >>>> improvement >>>> there. >>> >>> If that's the case, I'd simply recommend using a modern compiler and >>> if you can a recent GROMACS version, you'll gain more performance than >>> from most launch config tuning. >>> >>>> The cluster nodes I have been using have the following cpu specs >>>> and 10Gb networking. It could be that using 2 OpenMP threads per MPI >>>> rank >>>> works nicely because it matches the CPU configuration and makes better >>>> use >>>> of hyperthreading. >>> >>> Or because of the network. Or for some other reason. Again, comparing >>> the runs' log files could tell more :) >>> >>>> Architecture: x86_64 >>>> CPU op-mode(s): 32-bit, 64-bit >>>> Byte Order: Little Endian >>>> CPU(s): 8 >>>> On-line CPU(s) list: 0-7 >>>> Thread(s) per core: 2 >>>> Core(s) per socket: 2 >>>> Socket(s): 2 >>>> NUMA node(s): 2 >>>> Vendor ID: GenuineIntel >>>> CPU family: 6 >>>> Model: 26 >>>> Model name: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz >>>> Stepping: 5 >>>> CPU MHz: 2393.791 >>>> BogoMIPS: 4787.24 >>>> Virtualization: VT-x >>>> L1d cache: 32K >>>> L1i cache: 32K >>>> L2 cache: 256K >>>> L3 cache: 8192K >>>> NUMA node0 CPU(s): 0,2,4,6 >>>> NUMA node1 CPU(s): 1,3,5,7 >>>> >>>> I appreciate that a lot is system-dependent and that I can't really >>>> help >>>> you help me very much. It also should be noted that my multi runs are >>>> multiple walker metadynamics run and are slowing down because there >>>> are >>>> large bias potentials in memory that need to be communicated around >>>> too. >>>> As I said I haven't had a chance to make separate benchmark runs but >>>> have >>>> just made observations based upon existing runs. >>> >>> Understandable, I was just giving tips and hints. >>> >>> Cheers, >>> -- >>> Sz. >>> >>> >>>> Best wishes >>>> James >>>> >>>>> Performance tuning is highly dependent on the simulation system and >>>>> the hardware you're running on. Questions like the ones you pose are >>>>> impossible to answer meaningfully without *full* log files (and >>>>> hardware specs including network). >>>>> >>>>> Have you checked the performance checklist I linked above? >>>>> -- >>>>> SzilÃÄrd >>>>> >>>>> >>>>> On Wed, Sep 21, 2016 at 11:36 AM, <jkrie...@mrc-lmb.cam.ac.uk> >>>>> wrote: >>>>>> I wonder whether what I see that -np 108 and -ntomp 2 is best comes >>>>>> from >>>>>> using -multi 6 with 8-CPU nodes. That level of parallelism may then >>>>>> be >>>>>> necessary to trigger automatic segregation of PP and PME ranks. I'm >>>>>> not >>>>>> sure if I tried -np 54 and -ntomp 4, which would probably also do >>>>>> it. >>>>>> I >>>>>> compared mostly on 196 CPUs then found going up to 216 was better >>>>>> than >>>>>> 196 >>>>>> with -ntomp 2 and pure MPI (-ntomp 1) was considerably worse for >>>>>> both. >>>>>> Would people recommend to go back to 196 which allows 4 whole nodes >>>>>> per >>>>>> replica and playing with -npme and -ntomp_pme? >>>>>> >>>>>>> Hi Thanh Le, >>>>>>> >>>>>>> Assuming all the nodes are the same (9 nodes with 12 CPUs) then you >>>>>>> could >>>>>>> try the following >>>>>>> >>>>>>> mpirun -np 9 --map-by node mdrun -ntomp 12 ... >>>>>>> mpirun -np 18 mdrun -ntomp 6 ... >>>>>>> mpirun -np 54 mdrun -ntomp 2 ... >>>>>>> >>>>>>> Which of these works best will depend on your setup. >>>>>>> >>>>>>> Using the whole cluster for one job may not be the most efficient >>>>>>> way. >>>>>>> I >>>>>>> found on our cluster that once I reach 216 CPUs (equivalent >>>>>>> settings >>>>>>> from >>>>>>> the queuing system to -np 108 and -ntomp 2), I can't do better by >>>>>>> adding >>>>>>> more nodes (where presumably communication becomes an issue). In >>>>>>> addition >>>>>>> to running -multi or -multidir jobs, which takes the load off >>>>>>> communication a bit, it may also be worth having separate jobs and >>>>>>> using >>>>>>> -pin on and -pinoffset. >>>>>>> >>>>>>> Best wishes >>>>>>> James >>>>>>> >>>>>>>> Hi everyone, >>>>>>>> I have a question concerning running gromacs in parallel. I have >>>>>>>> read >>>>>>>> over >>>>>>>> the >>>>>>>> http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html >>>>>>>> <http://manual.gromacs.org/documentation/5.1/user-guide/mdrun-performance.html> >>>>>>>> but I still dont quite understand how to run it efficiently. >>>>>>>> My gromacs version is 4.5.4 >>>>>>>> The cluster I am using has CPUs total: 108 and 4 hosts up. >>>>>>>> The node iam using: >>>>>>>> Architecture: x86_64 >>>>>>>> CPU op-mode(s): 32-bit, 64-bit >>>>>>>> Byte Order: Little Endian >>>>>>>> CPU(s): 12 >>>>>>>> On-line CPU(s) list: 0-11 >>>>>>>> Thread(s) per core: 2 >>>>>>>> Core(s) per socket: 6 >>>>>>>> Socket(s): 1 >>>>>>>> NUMA node(s): 1 >>>>>>>> Vendor ID: AuthenticAMD >>>>>>>> CPU family: 21 >>>>>>>> Model: 2 >>>>>>>> Stepping: 0 >>>>>>>> CPU MHz: 1400.000 >>>>>>>> BogoMIPS: 5200.57 >>>>>>>> Virtualization: AMD-V >>>>>>>> L1d cache: 16K >>>>>>>> L1i cache: 64K >>>>>>>> L2 cache: 2048K >>>>>>>> L3 cache: 6144K >>>>>>>> NUMA node0 CPU(s): 0-11 >>>>>>>> MPI is already installed. I also have permission to use the >>>>>>>> cluster >>>>>>>> as >>>>>>>> much as I can. >>>>>>>> My question is: how should I write my mdrun command run to utilize >>>>>>>> all >>>>>>>> the >>>>>>>> possible cores and nodes? >>>>>>>> Thanks, >>>>>>>> Thanh Le >>>>>>>> -- >>>>>>>> Gromacs Users mailing list >>>>>>>> >>>>>>>> * Please search the archive at >>>>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >>>>>>>> posting! >>>>>>>> >>>>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>>>>>> >>>>>>>> * For (un)subscribe requests visit >>>>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users >>>>>>>> or >>>>>>>> send >>>>>>>> a mail to gmx-users-requ...@gromacs.org. >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Gromacs Users mailing list >>>>>> >>>>>> * Please search the archive at >>>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >>>>>> posting! >>>>>> >>>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>>>> >>>>>> * For (un)subscribe requests visit >>>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users >>>>>> or >>>>>> send a mail to gmx-users-requ...@gromacs.org. >>>>> -- >>>>> Gromacs Users mailing list >>>>> >>>>> * Please search the archive at >>>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >>>>> posting! >>>>> >>>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>>> >>>>> * For (un)subscribe requests visit >>>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>>>> send >>>>> a mail to gmx-users-requ...@gromacs.org. >>>> >>>> >>>> >>>> -- >>>> Gromacs Users mailing list >>>> >>>> * Please search the archive at >>>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >>>> posting! >>>> >>>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>>> >>>> * For (un)subscribe requests visit >>>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>>> send a mail to gmx-users-requ...@gromacs.org. >>> -- >>> Gromacs Users mailing list >>> >>> * Please search the archive at >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >>> posting! >>> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >>> * For (un)subscribe requests visit >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> send >>> a mail to gmx-users-requ...@gromacs.org. >> >> >> >> -- >> Gromacs Users mailing list >> >> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before >> posting! >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> * For (un)subscribe requests visit >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >> send a mail to gmx-users-requ...@gromacs.org. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send > a mail to gmx-users-requ...@gromacs.org.
-- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.