On Fri, Sep 15, 2017 at 1:06 AM, gromacs query <[email protected]> wrote: > Hi Szilárd, > > Sorry this discussion is going long. > Finally I got one node empty and did some serious tests specially > considering your first point (discrepancies in benchmarking comparing jobs > running on empty node vs occupied node). I tested in both ways. > > I ran following cases (single job vs two jobs for 2GPU+4 procs and also for > 4GPU+16 procs). Happy to send log files.
Please do share them, it's hard to assess what's going on without those. > Pinoffset results are surprising (4th and 8th test case below) though I get > in log file a WARNING: Requested offset too large for available cores for > the case 8; [should not be an issue as the first job binds the cores] That means the offsets are not set correctly. > As suggested defining affinity should help with pinoffset set 'manually' > (in practice with script) but these results are quite variable. Am bit lost > now, what should be the best practice in case nodes are shared among > different users and multidir can be tricky in such case (if other gromacs > users are not using multidir option!). I suggest fixing the above issue first. I don't fully understand what the below descriptions mean, please be more specific about the details or share logs. > > Sr. no. each job 2GPU; 4 procs performance (ns/day) > 1 only one job 345 > 2 two same jobs together (without pin on) 16.1 and 15.9 > 3 two same jobs together (without pin on, with -multidir) 178 and 191 > 4 two same jobs together (pin on, pinoffset at 0 and 5) 160 and 301 > each job 4GPU; 16 procs performance (ns/day) > 5 only one job 694 > 6 two same jobs together (without pin on) 340 and 350 > 7 two same jobs together (without pin on, with -multidir) 346 and 344 > 8 two same jobs together (pin on, pinoffset at 0 and 17) 204 and 546 > > > On Thu, Sep 14, 2017 at 12:02 PM, gromacs query <[email protected]> > wrote: > >> Hi Szilárd, >> >> Here are my replies: >> >> >> Did you run the "fast" single job on an otherwise empty node? That >> might explain it as, when most of the CPU cores are left empty, modern CPUs >> increase clocks (tubo boost) on the used cores higher than they could with >> all cores busy. >> >> Yes the "fast" single job was on empty node. Sorry I don't get it when you >> say 'modern CPUs increase clocks', you mean the ns/day I get is pseudo in >> that case? >> >> >> and if you post an actual log I can certainly give more informed >> comments >> >> Sure, if its ok can I post it off-mailing list to you? >> >> >> However, note that if you are sharing a node with others, if their jobs >> are not correctly affinitized, those processes will affect the performance >> of your job. >> >> Yes exactly. In this case I would need to manually set pinoffset but this >> can be but frustrating if other Gromacs users are not binding :) >> Would it be possible to fix this in the default algorithm, though am >> unaware of other issues it might cause? Also mutidir is not convenient >> sometimes when job crashes in the middle and automatic restart from cpt >> file would be difficult. >> >> -J >> >> >> On Thu, Sep 14, 2017 at 11:26 AM, Szilárd Páll <[email protected]> >> wrote: >> >>> On Wed, Sep 13, 2017 at 11:14 PM, gromacs query <[email protected]> >>> wrote: >>> > Hi Szilárd, >>> > >>> > Thanks again. I tried now with -multidir like this: >>> > >>> > mpirun -np 16 gmx_mpi mdrun -s test -ntomp 2 -maxh 0.1 -multidir t1 t2 >>> t3 t4 >>> > >>> > So this runs 4 jobs on same node so for each job np is = 16/4, and each >>> job >>> > using 2 GPU. I get now quite improved performance and equal performance >>> for >>> > each job (~ 220 ns) though still slightly less than single independent >>> job >>> > (where I get 300 ns). I can live with that but - >>> >>> That is not normal and it is more likely to be a benchmarking >>> discrepancy: you are likely not comparing apples to apples. Did you >>> run the "fast" single job on an otherwise empty node? That might >>> explain it as, when most of the CPU cores are left empty, modern CPUs >>> increase clocks (tubo boost) on the used cores higher than they could >>> with all cores busy. >>> >>> > Surprised: There are maximum 40 cores and 8 GPUs per node and thus my 4 >>> > jobs should consume 8 GPUS. >>> >>> Note that even if those are 40 real cores (rather than 20 core with >>> HyperThreading), the current GROMACS release will be unlikely to run >>> efficiently with at least 6-8 cores per GPU. This will likely change >>> with the next release. >>> >>> > So I am bit surprised with the fact the same >>> > node on which my four jobs were running was already occupied with jobs >>> by >>> > some other user, which I think should not happen (may be slurm.config >>> admin >>> > issue?). Either my some jobs should have gone in queue or run on other >>> node >>> > if free. >>> >>> Sounds like a job scheduler issue (you can always check in the log the >>> detected hardware) -- and if you post an actual log I can certainly >>> give more informed comments. >>> >>> > What to do: Importantly though as an individual user I can submit >>> -multidir >>> > job but lets say, which is normally the case, there will be many other >>> > unknown users who submit one or two jobs in that case performance will >>> be >>> > an issue (which is equivalent to my case when I submit many jobs without >>> > -multi/multidir). >>> >>> Not sure I follow: if you always have a number of similar runs to do, >>> submit them together and benefit from not having to manual hardware >>> assignment. Otherwise, if your cluster relies on node sharing, you >>> will have to make sure that you specify correctly the affinity/binding >>> arguments to your job scheduler (or work around it with manual offset >>> calculation). However, note that if you are sharing a node with >>> others, if their jobs are not correctly affinitized, those processes >>> will affect the performance of your job. >>> >>> > I think still they will need -pinoffset. Could you >>> > please suggest what best can be done in such case? >>> >>> See above. >>> >>> Cheers, >>> -- >>> Szilárd >>> >>> > >>> > -Jiom >>> > >>> > >>> > >>> > >>> > On Wed, Sep 13, 2017 at 9:15 PM, Szilárd Páll <[email protected]> >>> > wrote: >>> > >>> >> Hi, >>> >> >>> >> First off, have you considered options 2) using multi-sim? That would >>> >> allow you to not have to bother manually set offsets. Can you not >>> >> submit your jobs such that you fill at least a node? >>> >> >>> >> How many threads/cores does you node have? Can you share log files? >>> >> >>> >> Cheers, >>> >> -- >>> >> Szilárd >>> >> >>> >> >>> >> On Wed, Sep 13, 2017 at 9:14 PM, gromacs query <[email protected] >>> > >>> >> wrote: >>> >> > Hi Szilárd, >>> >> > >>> >> > Sorry I was bit quick to say its working with pinoffset. I just >>> submitted >>> >> > four same jobs (2 gpus, 4 nprocs) on the same node with -pin on and >>> >> > different -pinoffset to 0, 5, 10, 15 (numbers should be fine as >>> there are >>> >> > 40 cores on node). Still I don't get same performance (all variably >>> less >>> >> > than 50%) as expected from a single independent job. Now am >>> wondering if >>> >> > its still related to overlap of cores as pin on should lock the >>> cores for >>> >> > the same job. >>> >> > >>> >> > -J >>> >> > >>> >> > On Wed, Sep 13, 2017 at 7:33 PM, gromacs query < >>> [email protected]> >>> >> > wrote: >>> >> > >>> >> >> Hi Szilárd, >>> >> >> >>> >> >> Thanks, option 3 was in my mind but I need to figure out now how :) >>> >> >> Manually fixing pinoffset as of now seems working with some quick >>> test. >>> >> >> I think option 1 would require to ask the admin but I can try >>> option 3 >>> >> >> myself. As there are other users from different places who may not >>> >> bother >>> >> >> using option 3. I think I would need to ask the admin to force >>> option 1 >>> >> but >>> >> >> before that I will try option 3. >>> >> >> >>> >> >> JIom >>> >> >> >>> >> >> On Wed, Sep 13, 2017 at 7:10 PM, Szilárd Páll < >>> [email protected]> >>> >> >> wrote: >>> >> >> >>> >> >>> J, >>> >> >>> >>> >> >>> You have a few options: >>> >> >>> >>> >> >>> * Use SLURM to assign not only the set of GPUs, but also the >>> correct >>> >> >>> set of CPU cores to each mdrun process. If you do so, mdrun will >>> >> >>> respect the affinity mask it will inherit and your two mdrun jobs >>> >> >>> should be running on the right set of cores. This has the drawback >>> >> >>> that (AFAIK) SLURM/aprun (or srun) will not allow you to bind each >>> >> >>> application thread to a core/hardware thread (which is what mdrun >>> >> >>> does), only a process to a group of cores/hw threads which can >>> >> >>> sometimes lead to performance loss. (You might be able to >>> compensate >>> >> >>> using some OpenMP library environment variables, though.) >>> >> >>> >>> >> >>> * Run multiple jobs with mdrun "-multi"/"-multidir" (either two >>> per >>> >> >>> node or mulitple across nodes) and benefit from the rank/thread to >>> >> >>> core/hw thread assignment that's supported also across multiple >>> >> >>> simulations part of a multi-run; e.g.: >>> >> >>> mpirun -np 4 gmx mdrun -multi 4 -ntomp N -multidir >>> >> my_input_dir{1,2,3,4} >>> >> >>> will launch 4 ranks and start 4 simulations in each of the four >>> >> >>> directories passed. >>> >> >>> >>> >> >>> * Write a wrapper script around gmx mdrun which will be what you >>> >> >>> launch with SLURM; you can then inspect the node and decide what >>> >> >>> pinoffset value to pass to your mdrun launch command. >>> >> >>> >>> >> >>> >>> >> >>> I hope one of these will deliver the desired results :) >>> >> >>> >>> >> >>> Cheers, >>> >> >>> -- >>> >> >>> Szilárd >>> >> >>> >>> >> >>> >>> >> >>> On Wed, Sep 13, 2017 at 7:47 PM, gromacs query < >>> [email protected] >>> >> > >>> >> >>> wrote: >>> >> >>> > Hi Szilárd, >>> >> >>> > >>> >> >>> > Thanks for your reply. This is useful but now am thinking >>> because the >>> >> >>> slurm >>> >> >>> > launches job in an automated way it is not really in my control >>> to >>> >> >>> choose >>> >> >>> > the node. So following things can happen; say for two mdrun jobs >>> I >>> >> set >>> >> >>> > -pinoffset 0 and -pinoffset 4; >>> >> >>> > >>> >> >>> > - if they are running on the same node this is good >>> >> >>> > - if jobs run on different nodes (partially occupied or free) >>> whether >>> >> >>> these >>> >> >>> > chosen pinoffsets will make sense or not as I don't know what >>> >> pinoffset >>> >> >>> I >>> >> >>> > would need to set >>> >> >>> > - if I have to submit many jobs together and slurm chooses >>> >> >>> different/same >>> >> >>> > node itself then I think it is difficult to define pinoffset. >>> >> >>> > >>> >> >>> > - >>> >> >>> > J >>> >> >>> > >>> >> >>> > On Wed, Sep 13, 2017 at 6:14 PM, Szilárd Páll < >>> >> [email protected]> >>> >> >>> > wrote: >>> >> >>> > >>> >> >>> >> My guess is that the two jobs are using the same cores -- >>> either all >>> >> >>> >> cores/threads or only half of them, but the same set. >>> >> >>> >> >>> >> >>> >> You should use -pinoffset; see: >>> >> >>> >> >>> >> >>> >> - Docs and example: >>> >> >>> >> http://manual.gromacs.org/documentation/2016/user-guide/ >>> >> >>> >> mdrun-performance.html >>> >> >>> >> >>> >> >>> >> - More explanation on the thread pinning behavior on the old >>> >> website: >>> >> >>> >> http://www.gromacs.org/Documentation/Acceleration_ >>> >> >>> >> and_parallelization#Pinning_threads_to_physical_cores >>> >> >>> >> >>> >> >>> >> Cheers, >>> >> >>> >> -- >>> >> >>> >> Szilárd >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> On Wed, Sep 13, 2017 at 6:35 PM, gromacs query < >>> >> [email protected] >>> >> >>> > >>> >> >>> >> wrote: >>> >> >>> >> > Sorry forgot to add; we thought the two jobs are using same >>> GPU >>> >> ids >>> >> >>> but >>> >> >>> >> > cuda visible devices show both jobs are using different ids >>> (0,1 >>> >> and >>> >> >>> 2,3) >>> >> >>> >> > >>> >> >>> >> > - >>> >> >>> >> > J >>> >> >>> >> > >>> >> >>> >> > On Wed, Sep 13, 2017 at 5:33 PM, gromacs query < >>> >> >>> [email protected]> >>> >> >>> >> > wrote: >>> >> >>> >> > >>> >> >>> >> >> Hi All, >>> >> >>> >> >> >>> >> >>> >> >> I have some issues with gromacs performance. There are many >>> nodes >>> >> >>> and >>> >> >>> >> each >>> >> >>> >> >> node has number of gpus and the batch process is controlled >>> by >>> >> >>> slurm. >>> >> >>> >> >> Although I get good performance with some settings of number >>> of >>> >> >>> gpus and >>> >> >>> >> >> nprocs but when I submit same job twice on the same node >>> then the >>> >> >>> >> >> performance is reduced drastically. e.g >>> >> >>> >> >> >>> >> >>> >> >> For 2 GPUs I get 300 ns per day when there is no other job >>> >> running >>> >> >>> on >>> >> >>> >> the >>> >> >>> >> >> node. When I submit same job twice on the same node & at the >>> same >>> >> >>> time, >>> >> >>> >> I >>> >> >>> >> >> get only 17 ns/day for both the jobs. I am using this: >>> >> >>> >> >> >>> >> >>> >> >> mpirun -np 4 gmx_mpi mdrun -deffnm test -ntomp 2 -maxh 0.12 >>> >> >>> >> >> >>> >> >>> >> >> Any suggestions highly appreciated. >>> >> >>> >> >> >>> >> >>> >> >> Thanks >>> >> >>> >> >> >>> >> >>> >> >> Jiom >>> >> >>> >> >> >>> >> >>> >> > -- >>> >> >>> >> > Gromacs Users mailing list >>> >> >>> >> > >>> >> >>> >> > * Please search the archive at http://www.gromacs.org/ >>> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting! >>> >> >>> >> > >>> >> >>> >> > * Can't post? Read http://www.gromacs.org/Support >>> /Mailing_Lists >>> >> >>> >> > >>> >> >>> >> > * For (un)subscribe requests visit >>> >> >>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_ >>> >> gmx-users >>> >> >>> or >>> >> >>> >> send a mail to [email protected]. >>> >> >>> >> -- >>> >> >>> >> Gromacs Users mailing list >>> >> >>> >> >>> >> >>> >> * Please search the archive at http://www.gromacs.org/ >>> >> >>> >> Support/Mailing_Lists/GMX-Users_List before posting! >>> >> >>> >> >>> >> >>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >> >>> >> >>> >> >>> >> * For (un)subscribe requests visit >>> >> >>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx >>> -users >>> >> or >>> >> >>> >> send a mail to [email protected]. >>> >> >>> > -- >>> >> >>> > Gromacs Users mailing list >>> >> >>> > >>> >> >>> > * Please search the archive at http://www.gromacs.org/Support >>> >> >>> /Mailing_Lists/GMX-Users_List before posting! >>> >> >>> > >>> >> >>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >> >>> > >>> >> >>> > * For (un)subscribe requests visit >>> >> >>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx >>> -users >>> >> or >>> >> >>> send a mail to [email protected]. >>> >> >>> -- >>> >> >>> Gromacs Users mailing list >>> >> >>> >>> >> >>> * Please search the archive at http://www.gromacs.org/Support >>> >> >>> /Mailing_Lists/GMX-Users_List before posting! >>> >> >>> >>> >> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >> >>> >>> >> >>> * For (un)subscribe requests visit >>> >> >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users >>> or >>> >> >>> send a mail to [email protected]. >>> >> >>> >>> >> >> >>> >> >> >>> >> > -- >>> >> > Gromacs Users mailing list >>> >> > >>> >> > * Please search the archive at http://www.gromacs.org/ >>> >> Support/Mailing_Lists/GMX-Users_List before posting! >>> >> > >>> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >> > >>> >> > * For (un)subscribe requests visit >>> >> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users >>> or >>> >> send a mail to [email protected]. >>> >> -- >>> >> Gromacs Users mailing list >>> >> >>> >> * Please search the archive at http://www.gromacs.org/ >>> >> Support/Mailing_Lists/GMX-Users_List before posting! >>> >> >>> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >> >>> >> * For (un)subscribe requests visit >>> >> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> >> send a mail to [email protected]. >>> >> >>> > -- >>> > Gromacs Users mailing list >>> > >>> > * Please search the archive at http://www.gromacs.org/Support >>> /Mailing_Lists/GMX-Users_List before posting! >>> > >>> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> > >>> > * For (un)subscribe requests visit >>> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> send a mail to [email protected]. >>> -- >>> Gromacs Users mailing list >>> >>> * Please search the archive at http://www.gromacs.org/Support >>> /Mailing_Lists/GMX-Users_List before posting! >>> >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >>> >>> * For (un)subscribe requests visit >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or >>> send a mail to [email protected]. >>> >> >> > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to [email protected]. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to [email protected].
