Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options
Just a note regarding the performance issues mentioned. You are using reaction-field electrostatics case in which by default there is very little force workload left for the CPU (only the bondeds) and therefore the CPU idles most of the time. To improve performance, use -nb gpu_cpu with multiple ranks (e.g. 2, 4, or 8) per GPU. -- Szilárd On Fri, Jul 5, 2013 at 9:26 PM, Mark Abraham mark.j.abra...@gmail.com wrote: On Fri, Jul 5, 2013 at 11:52 AM, Cara Kreck cara_...@hotmail.com wrote: Sorry, the tables got all messed up. I've converted them to just text now: From: cara_...@hotmail.com To: gmx-users@gromacs.org Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up options Date: Fri, 5 Jul 2013 17:31:04 +0800 Hi everyone, I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit solvent and comparing it to previous results from 3.3.3. Firstly I wanted to check if the reported reaction field issues of 4.5 was fixed (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and then I wanted to check which was the most efficient way to run. Since my simulation made it to 100ns without crashing, I'm hopeful that RF is no longer an issue. I then ran several shorter (4.5 ns) simulations with slightly different options but the same (equilibrated) starting point to compare run times. Not surprisingly for RF, it was much quicker to use just CPUs and forget about the GPU. However, when I did some basic analysis of my results, I found that there was some surprising differences between the runs. I then added in a couple of PME runs to verify that it wasn't RF specific. Temp and pressure were set to 303K and 1 bar, both with Berendsen. TemperaturePotential E. Pressure System nameDetails AverageRMSDAverageRMSD AverageRMSD 3.3.3 c_md RF nst5 group306.0 1.4-439029466 0.998 125 4.6 c_md RF nst5 group303.9 1.4-440455461 0.0570 126 4.6 c_vv RF nst5 verlet 303.0 1.2-438718478 1.96 134 4.6 g_md RF nst20 verlet 303.0 1.4-4393593193566 1139 4.6 g_vv RF nst20 verlet 303.0 1.2-438635304834.3 405 4.6 c_pme md nst5 group303.0 1.4-436138461 0.135 125 4.6 g_pme md nst40 verlet 303.0 1.4-431621463 416 1016 Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv integrator, etc. Verlet group refer to cut-off scheme and nst# refers to nstlist frequency which was automatically changed by gromacs. I found very similar results (and run times) for the GPU runs when -nb was set to gpu or gpu_cpu. The only other difference between runs is that in 3.3.3 only the bilayer was listed for comm_grps. In 4.6 I added the solvent due to a grompp warning, but I don't know how significant that is. It looks like the thermostat in 4.6 is more effective than in 3.3.3. According to the 3.3.3 log file, the average temp of the bilayer and solvent were 302.0K and 307.6K respectively, whereas the difference between the two is much smaller in the 4.6 runs (1.3K for c_md and 0.2K for the rest). I don't know if this could be in any way related to the other discrepancies. I would say it shows quite clearly that the 3.3.3 RF regime had significant cut-off based heating for all the reasons discussed in the old RF thread you linked. There would seem to be some other effect contributing to produce the temperature difference between rows 1 and 2 above. Using either Verlet lists or PME seems pretty good, though ;-) I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x RMSD). As it gave the best run time, this is the set-up I had hoped to use. I'd say the known poor quality of the 3.3.3 integration regime would make this consideration insignificant. I'm also surprised by how inaccurate the pressure calculations are One needs a lot of samples to measure something's value in the face of RMSD three times its value. But perhaps nstcalcenergy 1 is making such an observation artificially tougher :-) and how large the RMSDs are for P.E. (RF only) and pressure (RF PME) are when the GPU is used. Your reports of bad performance, atypical RSMD, and zero potentials below indicate something is badly astray. Please open an issue at www.redmine.org and attach your .tpr and .log files (tarball, preferably). If we can reproduce those numbers, then they need fixing. But so far I'm not suspecting the code is the issue :-) I then looked at the energies of step 0 in the log files Using mdrun -rerun is a much more sound method, because you guarantee neighbour searching and no integration. and
[gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options
Sorry, the tables got all messed up. I've converted them to just text now: From: cara_...@hotmail.com To: gmx-users@gromacs.org Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up options Date: Fri, 5 Jul 2013 17:31:04 +0800 Hi everyone, I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit solvent and comparing it to previous results from 3.3.3. Firstly I wanted to check if the reported reaction field issues of 4.5 was fixed (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and then I wanted to check which was the most efficient way to run. Since my simulation made it to 100ns without crashing, I'm hopeful that RF is no longer an issue. I then ran several shorter (4.5 ns) simulations with slightly different options but the same (equilibrated) starting point to compare run times. Not surprisingly for RF, it was much quicker to use just CPUs and forget about the GPU. However, when I did some basic analysis of my results, I found that there was some surprising differences between the runs. I then added in a couple of PME runs to verify that it wasn't RF specific. Temp and pressure were set to 303K and 1 bar, both with Berendsen. TemperaturePotential E. Pressure System nameDetails AverageRMSDAverageRMSDAverage RMSD 3.3.3 c_md RF nst5 group306.0 1.4-439029466 0.998 125 4.6 c_md RF nst5 group303.9 1.4-440455461 0.0570 126 4.6 c_vv RF nst5 verlet 303.0 1.2-438718478 1.96 134 4.6 g_md RF nst20 verlet 303.0 1.4-4393593193566 1139 4.6 g_vv RF nst20 verlet 303.0 1.2-438635304834.3 405 4.6 c_pme md nst5 group303.0 1.4-436138461 0.135 125 4.6 g_pme md nst40 verlet 303.0 1.4-431621463 416 1016 Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv integrator, etc. Verlet group refer to cut-off scheme and nst# refers to nstlist frequency which was automatically changed by gromacs. I found very similar results (and run times) for the GPU runs when -nb was set to gpu or gpu_cpu. The only other difference between runs is that in 3.3.3 only the bilayer was listed for comm_grps. In 4.6 I added the solvent due to a grompp warning, but I don't know how significant that is. It looks like the thermostat in 4.6 is more effective than in 3.3.3. According to the 3.3.3 log file, the average temp of the bilayer and solvent were 302.0K and 307.6K respectively, whereas the difference between the two is much smaller in the 4.6 runs (1.3K for c_md and 0.2K for the rest). I don't know if this could be in any way related to the other discrepancies. I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x RMSD). As it gave the best run time, this is the set-up I had hoped to use. I'm also surprised by how inaccurate the pressure calculations are and how large the RMSDs are for P.E. (RF only) and pressure (RF PME) are when the GPU is used. I then looked at the energies of step 0 in the log files and found that several of the reported energy types varied, which I would have expected to be identical (for RF+group) or similar (for Verlet or PME) to 3.3.3 as they are all continuations from the same starting point. SystemLJ (SR)Coulomb (SR)Potential Kinetic En. Total EnergyTemperaturePressure (bar) 3.3.3 c_md1.80072E+04-4.30514E+05-4.38922E+056.14932E+04 -3.77429E+053.06083E+021.53992E+02 4.6 c_md 1.80072E+04-4.30515E+05-4.38922E+056.20484E+04 -3.76874E+053.08847E+021.56245E+02 4.6 c_vv 1.15784E+04-4.83639E+05-4.37388E+056.14748E+04 -3.75913E+053.05992E+02-1.40193E+03 4.6 g_md 0.0E+00 0.0E+00 3.46728E+046.14991E+04 9.61719E+043.06113E+02-1.70102E+04 4.6 g_vv 0.0E+00 0.0E+00 3.46728E+046.14748E+04 9.61476E+043.05992E+02-1.85758E+04 4.6 c_pme 1.30512E+04-3.37973E+05-4.35821E+056.14989E+04 -3.74322E+053.06112E+024.50028E+02 4.6 g_pme 1.76523E+04-4.89006E+05-4.31207E+056.14990E+04 -3.69708E+053.06112E+024.37951E+02 Even 4.6 c_md has a different K.E. and therefore T.E, temp pressure! How is that possible? There seems to be something weird going on when you combine RF with GPUs and/or the Verlet cut-off scheme, resulting in temporarily positive energies and/or negative pressures. I don't know if this matters in the end, but I thought it was odd that it only happens for RF. Recalculating the averages to ignore the weird step 0 values made negligible difference. So
Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options
On 2013-07-05 11:52, Cara Kreck wrote: Sorry, the tables got all messed up. I've converted them to just text now: From: cara_...@hotmail.com To: gmx-users@gromacs.org Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up options Date: Fri, 5 Jul 2013 17:31:04 +0800 Hi everyone, I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit solvent and comparing it to previous results from 3.3.3. Firstly I wanted to check if the reported reaction field issues of 4.5 was fixed (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and then I wanted to check which was the most efficient way to run. Since my simulation made it to 100ns without crashing, I'm hopeful that RF is no longer an issue. I then ran several shorter (4.5 ns) simulations with slightly different options but the same (equilibrated) starting point to compare run times. Not surprisingly for RF, it was much quicker to use just CPUs and forget about the GPU. However, when I did some basic analysis of my results, I found that there was some surprising differences between the runs. I then added in a couple of PME runs to verify that it wasn't RF specific. Temp and pressure were set to 303K and 1 bar, both with Berendsen. TemperaturePotential E. Pressure System nameDetails AverageRMSDAverageRMSDAverage RMSD 3.3.3 c_md RF nst5 group306.0 1.4-439029466 0.998 125 4.6 c_md RF nst5 group303.9 1.4-440455461 0.0570 126 4.6 c_vv RF nst5 verlet 303.0 1.2-438718478 1.96 134 4.6 g_md RF nst20 verlet 303.0 1.4-4393593193566 1139 4.6 g_vv RF nst20 verlet 303.0 1.2-438635304834.3 405 4.6 c_pme md nst5 group303.0 1.4-436138461 0.135 125 4.6 g_pme md nst40 verlet 303.0 1.4-431621463 416 1016 Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv integrator, etc. Verlet group refer to cut-off scheme and nst# refers to nstlist frequency which was automatically changed by gromacs. I found very similar results (and run times) for the GPU runs when -nb was set to gpu or gpu_cpu. The only other difference between runs is that in 3.3.3 only the bilayer was listed for comm_grps. In 4.6 I added the solvent due to a grompp warning, but I don't know how significant that is. It looks like the thermostat in 4.6 is more effective than in 3.3.3. According to the 3.3.3 log file, the average temp of the bilayer and solvent were 302.0K and 307.6K respectively, whereas the difference between the two is much smaller in the 4.6 runs (1.3K for c_md and 0.2K for the rest). I don't know if this could be in any way related to the other discrepancies. I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x RMSD). As it gave the best run time, this is the set-up I had hoped to use. I'm also surprised by how inaccurate the pressure calculations are and how large the RMSDs are for P.E. (RF only) and pressure (RF PME) are when the GPU is used. I then looked at the energies of step 0 in the log files and found that several of the reported energy types varied, which I would have expected to be identical (for RF+group) or similar (for Verlet or PME) to 3.3.3 as they are all continuations from the same starting point. SystemLJ (SR)Coulomb (SR)Potential Kinetic En. Total EnergyTemperaturePressure (bar) 3.3.3 c_md1.80072E+04-4.30514E+05-4.38922E+056.14932E+04 -3.77429E+053.06083E+021.53992E+02 4.6 c_md 1.80072E+04-4.30515E+05-4.38922E+056.20484E+04 -3.76874E+053.08847E+021.56245E+02 4.6 c_vv 1.15784E+04-4.83639E+05-4.37388E+056.14748E+04 -3.75913E+053.05992E+02-1.40193E+03 4.6 g_md 0.0E+00 0.0E+00 3.46728E+046.14991E+04 9.61719E+043.06113E+02-1.70102E+04 4.6 g_vv 0.0E+00 0.0E+00 3.46728E+046.14748E+04 9.61476E+043.05992E+02-1.85758E+04 4.6 c_pme 1.30512E+04-3.37973E+05-4.35821E+056.14989E+04 -3.74322E+053.06112E+024.50028E+02 4.6 g_pme 1.76523E+04-4.89006E+05-4.31207E+056.14990E+04 -3.69708E+053.06112E+024.37951E+02 Even 4.6 c_md has a different K.E. and therefore T.E, temp pressure! How is that possible? There seems to be something weird going on when you combine RF with GPUs and/or the Verlet cut-off scheme, resulting in temporarily positive energies and/or negative pressures. I don't know if this matters in the end, but I thought it was odd that it only happens for RF. Recalculating the averages to ignore the weird step 0
RE: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options
Date: Fri, 5 Jul 2013 12:38:15 +0200 From: sp...@xray.bmc.uu.se To: gmx-users@gromacs.org Subject: Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options On 2013-07-05 11:52, Cara Kreck wrote: Sorry, the tables got all messed up. I've converted them to just text now: From: cara_...@hotmail.com To: gmx-users@gromacs.org Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up options Date: Fri, 5 Jul 2013 17:31:04 +0800 Hi everyone, I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit solvent and comparing it to previous results from 3.3.3. Firstly I wanted to check if the reported reaction field issues of 4.5 was fixed (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and then I wanted to check which was the most efficient way to run. Since my simulation made it to 100ns without crashing, I'm hopeful that RF is no longer an issue. I then ran several shorter (4.5 ns) simulations with slightly different options but the same (equilibrated) starting point to compare run times. Not surprisingly for RF, it was much quicker to use just CPUs and forget about the GPU. However, when I did some basic analysis of my results, I found that there was some surprising differences between the runs. I then added in a couple of PME runs to verify that it wasn't RF specific. Temp and pressure were set to 303K and 1 bar, both with Berendsen. TemperaturePotential E. Pressure System nameDetails AverageRMSDAverageRMSD AverageRMSD 3.3.3 c_md RF nst5 group306.0 1.4-439029466 0.998 125 4.6 c_md RF nst5 group303.9 1.4-440455461 0.0570 126 4.6 c_vv RF nst5 verlet 303.0 1.2-438718478 1.96 134 4.6 g_md RF nst20 verlet 303.0 1.4-4393593193566 1139 4.6 g_vv RF nst20 verlet 303.0 1.2-438635304834.3 405 4.6 c_pme md nst5 group303.0 1.4-436138461 0.135 125 4.6 g_pme md nst40 verlet 303.0 1.4-431621463 416 1016 Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv integrator, etc. Verlet group refer to cut-off scheme and nst# refers to nstlist frequency which was automatically changed by gromacs. I found very similar results (and run times) for the GPU runs when -nb was set to gpu or gpu_cpu. The only other difference between runs is that in 3.3.3 only the bilayer was listed for comm_grps. In 4.6 I added the solvent due to a grompp warning, but I don't know how significant that is. It looks like the thermostat in 4.6 is more effective than in 3.3.3. According to the 3.3.3 log file, the average temp of the bilayer and solvent were 302.0K and 307.6K respectively, whereas the difference between the two is much smaller in the 4.6 runs (1.3K for c_md and 0.2K for the rest). I don't know if this could be in any way related to the other discrepancies. I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x RMSD). As it gave the best run time, this is the set-up I had hoped to use. I'm also surprised by how inaccurate the pressure calculations are and how large the RMSDs are for P.E. (RF only) and pressure (RF PME) are when the GPU is used. I then looked at the energies of step 0 in the log files and found that several of the reported energy types varied, which I would have expected to be identical (for RF+group) or similar (for Verlet or PME) to 3.3.3 as they are all continuations from the same starting point. SystemLJ (SR)Coulomb (SR)Potential Kinetic En. Total EnergyTemperaturePressure (bar) 3.3.3 c_md1.80072E+04-4.30514E+05-4.38922E+056.14932E+04 -3.77429E+053.06083E+021.53992E+02 4.6 c_md 1.80072E+04-4.30515E+05-4.38922E+056.20484E+04 -3.76874E+053.08847E+021.56245E+02 4.6 c_vv 1.15784E+04-4.83639E+05-4.37388E+056.14748E+04 -3.75913E+053.05992E+02-1.40193E+03 4.6 g_md 0.0E+00 0.0E+00 3.46728E+046.14991E+04 9.61719E+043.06113E+02-1.70102E+04 4.6 g_vv 0.0E+00 0.0E+00 3.46728E+046.14748E+04 9.61476E+043.05992E+02-1.85758E+04 4.6 c_pme 1.30512E+04-3.37973E+05-4.35821E+056.14989E+04 -3.74322E+053.06112E+024.50028E+02 4.6 g_pme 1.76523E+04-4.89006E+05-4.31207E+056.14990E+04 -3.69708E+053.06112E+024.37951E+02 Even 4.6 c_md has a different K.E
Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options
On Fri, Jul 5, 2013 at 11:52 AM, Cara Kreck cara_...@hotmail.com wrote: Sorry, the tables got all messed up. I've converted them to just text now: From: cara_...@hotmail.com To: gmx-users@gromacs.org Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up options Date: Fri, 5 Jul 2013 17:31:04 +0800 Hi everyone, I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit solvent and comparing it to previous results from 3.3.3. Firstly I wanted to check if the reported reaction field issues of 4.5 was fixed (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and then I wanted to check which was the most efficient way to run. Since my simulation made it to 100ns without crashing, I'm hopeful that RF is no longer an issue. I then ran several shorter (4.5 ns) simulations with slightly different options but the same (equilibrated) starting point to compare run times. Not surprisingly for RF, it was much quicker to use just CPUs and forget about the GPU. However, when I did some basic analysis of my results, I found that there was some surprising differences between the runs. I then added in a couple of PME runs to verify that it wasn't RF specific. Temp and pressure were set to 303K and 1 bar, both with Berendsen. TemperaturePotential E. Pressure System nameDetails AverageRMSDAverageRMSDAverage RMSD 3.3.3 c_md RF nst5 group306.0 1.4-439029466 0.998 125 4.6 c_md RF nst5 group303.9 1.4-440455461 0.0570 126 4.6 c_vv RF nst5 verlet 303.0 1.2-438718478 1.96 134 4.6 g_md RF nst20 verlet 303.0 1.4-4393593193566 1139 4.6 g_vv RF nst20 verlet 303.0 1.2-438635304834.3 405 4.6 c_pme md nst5 group303.0 1.4-436138461 0.135 125 4.6 g_pme md nst40 verlet 303.0 1.4-431621463 416 1016 Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv integrator, etc. Verlet group refer to cut-off scheme and nst# refers to nstlist frequency which was automatically changed by gromacs. I found very similar results (and run times) for the GPU runs when -nb was set to gpu or gpu_cpu. The only other difference between runs is that in 3.3.3 only the bilayer was listed for comm_grps. In 4.6 I added the solvent due to a grompp warning, but I don't know how significant that is. It looks like the thermostat in 4.6 is more effective than in 3.3.3. According to the 3.3.3 log file, the average temp of the bilayer and solvent were 302.0K and 307.6K respectively, whereas the difference between the two is much smaller in the 4.6 runs (1.3K for c_md and 0.2K for the rest). I don't know if this could be in any way related to the other discrepancies. I would say it shows quite clearly that the 3.3.3 RF regime had significant cut-off based heating for all the reasons discussed in the old RF thread you linked. There would seem to be some other effect contributing to produce the temperature difference between rows 1 and 2 above. Using either Verlet lists or PME seems pretty good, though ;-) I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x RMSD). As it gave the best run time, this is the set-up I had hoped to use. I'd say the known poor quality of the 3.3.3 integration regime would make this consideration insignificant. I'm also surprised by how inaccurate the pressure calculations are One needs a lot of samples to measure something's value in the face of RMSD three times its value. But perhaps nstcalcenergy 1 is making such an observation artificially tougher :-) and how large the RMSDs are for P.E. (RF only) and pressure (RF PME) are when the GPU is used. Your reports of bad performance, atypical RSMD, and zero potentials below indicate something is badly astray. Please open an issue at www.redmine.org and attach your .tpr and .log files (tarball, preferably). If we can reproduce those numbers, then they need fixing. But so far I'm not suspecting the code is the issue :-) I then looked at the energies of step 0 in the log files Using mdrun -rerun is a much more sound method, because you guarantee neighbour searching and no integration. and found that several of the reported energy types varied, which I would have expected to be identical (for RF+group) or similar (for Verlet or PME) to 3.3.3 as they are all continuations from the same starting point. Switching integrator to vv does not make a valid comparison, because the interpretation of the velocities is now different by half a time step. There's no particular reason to suspect these PME and RF numbers are