>------------------------------ > >Message: 8 >Date: Tue, 17 Jul 2012 18:40:05 +1000 >From: Mark Abraham <[email protected]> >Subject: Re: [gmx-users] why Blue Gene/Q is so slow? >To: Discussion list for GROMACS users <[email protected]> >Message-ID: <[email protected]> >Content-Type: text/plain; charset=ISO-8859-1; format=flowed > >On 17/07/2012 5:00 PM, DeChang Li wrote: >> Dear all, >> >> I am running a 9000 atom system with GBSA (Gromacs 4.5.5) in a >> Blue Gene/Q cluster. I got the speed 1.002 ns/day with 8 cores. >> However, in my own workstation with 8 cores the same system can reach >> nearly 10 ns/day (Intel(R) Xeon(R) CPU E5620 @ 2.40GHz). Can anyone >> tell me what's wrong in my simulation? Any suggestion will be >> appreciated. > >Your workstation is running highly effective optimized SSE loops. >BlueGene/Q is not using its multiple FPU because that code hasn't been >written (for explicit or implicit solvation), and BlueGene's processors >are probably slower too. > >Mark
That means the code itself causes only 10% speed in BlueGene/Q compared with intel CPUs workstation? Is there any method to improve the speed in BG/Q? Dechang >> Following is my md.mdp file: >> >> constraints = hbonds >> constraint_algorithm = LINCS >> lincs_order = 4 >> comm_mode = Angular >> comm_grps = system >> integrator = sd >> ;annealing = single single >> ;annealing_npoints = 2 2 >> ;annealing_time = 0 500 0 500 >> ;annealing_temp = 200 300 200 300 >> dt = 0.002 ; ps ! >> nsteps = 5000000 ; total 5000 ps. >> nstcomm = 10 >> nstcalcenergy = 10 >> nstxout = 10000 ; collect data every 1 ps >> nstenergy = 10000 >> nstvout = 10000 >> nstlog = 1000 >> ;nstxtcout = 50000 >> ;xtc_grps = system >> nstfout = 0 >> nstlist = 10 >> ns_type = grid >> pbc = no >> rlist = 1.2 >> coulombtype = cut-off >> rcoulomb = 1.2 >> rvdw = 1.2 >> fourierspacing = 0.12 >> fourier_nx = 0 >> fourier_ny = 0 >> fourier_nz = 0 >> pme_order = 4 >> ewald_rtol = 1e-5 >> optimize_fft = yes >> ;energygrps = alpha1 alpha2 alpha3 beta1 beta2 beta3 gamma >> ;DispCorr = EnerPres >> ; Berendsen temperature coupling is on in two groups >> Tcoupl = >> tau_t = 0.5 >> tc-grps = system >> ref_t = 300 >> ; Pressure coupling is on >> Pcoupl = no ;berendsen >> tau_p = 1.0 >> compressibility = 4.5e-5 >> ref_p = 1.0 >> ; Generate velocites is on at 300 K. >> gen_vel = yes >> gen_temp = 300 >> gen_seed = -1 >> >> implicit_solvent = GBSA >> gb_algorithm = OBC >> rgbradii = 1.2 >> sa_surface_tension = 2.25936 >> >> >> >> Here is the preformace info: >> >> M E G A - F L O P S A C C O U N T I N G >> >> RF=Reaction-Field FE=Free Energy SCFE=Soft-Core/Free Energy >> T=Tabulated W3=SPC/TIP3p W4=TIP4p (single or pairs) >> NF=No Forces >> >> Computing: M-Number M-Flops % Flops >> ----------------------------------------------------------------------------- >> Generalized Born Coulomb 61.482892 2951.179 0.4 >> GB Coulomb + LJ 2565.481100 156494.347 19.4 >> Outer nonbonded loop 152.268546 1522.685 0.2 >> 1,4 nonbonded interactions 116.143224 10452.890 1.3 >> Born radii (HCT/OBC) 2868.222234 524884.669 64.9 >> Born force chain rule 2868.222234 43023.334 5.3 >> NS-Pairs 516.814696 10853.109 1.3 >> Reset In Box 4.464788 13.394 0.0 >> CG-CoM 4.482576 13.448 0.0 >> Bonds 22.174434 1308.292 0.2 >> Angles 80.586114 13538.467 1.7 >> Propers 160.742142 36809.951 4.6 >> Virial 4.636254 83.453 0.0 >> Update 44.478894 1378.846 0.2 >> Stop-CM 4.455894 44.559 0.0 >> Calc-Ekin 44.487788 1201.170 0.1 >> Lincs 44.951630 2697.098 0.3 >> Lincs-Mat 261.822552 1047.290 0.1 >> Constraint-V 44.951630 359.613 0.0 >> Constraint-Vir 2.251163 54.028 0.0 >> ----------------------------------------------------------------------------- >> Total 808731.820 100.0 >> ----------------------------------------------------------------------------- >> >> >> D O M A I N D E C O M P O S I T I O N S T A T I S T I C S >> >> av. #atoms communicated per step for force: 2 x 660.5 >> av. #atoms communicated per step for LINCS: 2 x 34.3 >> >> Average load imbalance: 1.7 % >> Part of the total run time spent waiting due to load imbalance: 1.4 % >> >> >> R E A L C Y C L E A N D T I M E A C C O U N T I N G >> >> Computing: Nodes Number G-Cycles Seconds % >> ----------------------------------------------------------------------- >> Domain decomp. 8 502 59.421 37.1 0.5 >> DD comm. load 8 8 0.004 0.0 0.0 >> Comm. coord. 8 5001 16.575 10.4 0.2 >> Neighbor search 8 502 136.093 85.1 1.2 >> Force 8 5001 9744.582 6090.7 88.3 >> Wait + Comm. F 8 5001 90.905 56.8 0.8 >> Write traj. 8 2 0.954 0.6 0.0 >> Update 8 5001 72.936 45.6 0.7 >> Constraints 8 10002 171.445 107.2 1.6 >> Comm. energies 8 502 10.427 6.5 0.1 >> Rest 8 732.742 458.0 6.6 >> ----------------------------------------------------------------------- >> Total 8 11036.086 6897.9 100.0 >> ----------------------------------------------------------------------- >> >> Parallel run - timing based on wallclock. >> >> NODE (s) Real (s) (%) >> Time: 862.243 862.243 100.0 >> 14:22 >> (Mnbf/s) (MFlops) (ns/day) (hour/ns) >> Performance: 3.047 937.940 1.002 23.946 >> Finished mdrun on node 0 Tue Jul 17 16:06:48 2012 > > -- gmx-users mailing list [email protected] http://lists.gromacs.org/mailman/listinfo/gmx-users * Only plain text messages are allowed! * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to [email protected]. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

