Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options

2013-07-10 Thread Szilárd Páll
Just a note regarding the performance issues mentioned. You are
using reaction-field electrostatics case in which by default there is
very little force workload left for the CPU (only the bondeds) and
therefore the CPU idles most of the time. To improve performance, use
-nb gpu_cpu with multiple ranks (e.g. 2, 4, or 8) per GPU.
--
Szilárd


On Fri, Jul 5, 2013 at 9:26 PM, Mark Abraham mark.j.abra...@gmail.com wrote:
 On Fri, Jul 5, 2013 at 11:52 AM, Cara Kreck cara_...@hotmail.com wrote:
 Sorry, the tables got all messed up. I've converted them to just text now:

 From: cara_...@hotmail.com
 To: gmx-users@gromacs.org
 Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up 
 options
 Date: Fri, 5 Jul 2013 17:31:04 +0800







 Hi everyone,

 I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster 
 node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit 
 solvent and comparing it to previous results from 3.3.3. Firstly I wanted to 
 check if the reported reaction field issues of 4.5 was fixed 
 (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and 
 then I wanted to check which was the most efficient way to run. Since my 
 simulation made it to 100ns without crashing, I'm hopeful that RF is no 
 longer an issue. I then ran several shorter (4.5 ns) simulations with 
 slightly different options but the same (equilibrated) starting point to 
 compare run times. Not surprisingly for RF, it was much quicker to use just 
 CPUs and forget about the GPU.

 However, when I did some basic analysis of my results, I found that there 
 was some surprising differences between the runs. I then added in a couple 
 of PME runs to verify that it wasn't RF specific. Temp and pressure were set 
 to 303K and 1 bar, both with Berendsen.

 TemperaturePotential E.   
 Pressure
 System nameDetails  AverageRMSDAverageRMSD
 AverageRMSD
 3.3.3 c_md RF nst5 group306.0   1.4-439029466 0.998  
 125
 4.6 c_md   RF nst5 group303.9   1.4-440455461 0.0570 
 126
 4.6 c_vv   RF nst5 verlet   303.0   1.2-438718478 1.96   
 134
 4.6 g_md   RF nst20 verlet  303.0   1.4-4393593193566
 1139
 4.6 g_vv   RF nst20 verlet  303.0   1.2-438635304834.3   
 405
 4.6 c_pme  md nst5 group303.0   1.4-436138461 0.135  
 125
 4.6 g_pme  md nst40 verlet  303.0   1.4-431621463 416
 1016

 Where c_md indicates CPU only and md integrator, g_vv indicates GPU and 
 md-vv integrator, etc. Verlet  group refer to cut-off scheme and nst# 
 refers to nstlist frequency which was automatically changed by gromacs. I 
 found very similar results (and run times) for the GPU runs when -nb was set 
 to gpu or gpu_cpu. The only other difference between runs is that in 3.3.3 
 only the bilayer was listed for comm_grps. In 4.6 I added the solvent due to 
 a grompp warning, but I don't know how significant that is.

 It looks like the thermostat in 4.6 is more effective than in 3.3.3. 
 According to the 3.3.3 log file, the average temp of the bilayer and solvent 
 were 302.0K and 307.6K respectively, whereas the difference between the two 
 is much smaller in the 4.6 runs (1.3K for c_md and 0.2K for the rest). I 
 don't know if this could be in any way related to the other discrepancies.

 I would say it shows quite clearly that the 3.3.3 RF regime had
 significant cut-off based heating for all the reasons discussed in the
 old RF thread you linked. There would seem to be some other effect
 contributing to produce the temperature difference between rows 1 and
 2 above. Using either Verlet lists or PME seems pretty good, though
 ;-)

 I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md 
 (~3x RMSD). As it gave the best run time, this is the set-up I had hoped to 
 use.

 I'd say the known poor quality of the 3.3.3 integration regime would
 make this consideration insignificant.

 I'm also surprised by how inaccurate the pressure calculations are

 One needs a lot of samples to measure something's value in the face of
 RMSD three times its value. But perhaps nstcalcenergy  1 is making
 such an observation artificially tougher :-)

 and how large the RMSDs are for P.E. (RF only) and pressure (RF  PME) are 
 when the GPU is used.

 Your reports of bad performance, atypical RSMD, and zero potentials
 below indicate something is badly astray. Please open an issue at
 www.redmine.org and attach your .tpr and .log files (tarball,
 preferably). If we can reproduce those numbers, then they need fixing.
 But so far I'm not suspecting the code is the issue :-)

 I then looked at the energies of step 0 in the log files

 Using mdrun -rerun is a much more sound method, because you guarantee
 neighbour searching and no integration.

 and 

[gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options

2013-07-05 Thread Cara Kreck
Sorry, the tables got all messed up. I've converted them to just text now:

From: cara_...@hotmail.com
To: gmx-users@gromacs.org
Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up options
Date: Fri, 5 Jul 2013 17:31:04 +0800







Hi everyone,

I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster 
node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit 
solvent and comparing it to previous results from 3.3.3. Firstly I wanted to 
check if the reported reaction field issues of 4.5 was fixed 
(http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and 
then I wanted to check which was the most efficient way to run. Since my 
simulation made it to 100ns without crashing, I'm hopeful that RF is no longer 
an issue. I then ran several shorter (4.5 ns) simulations with slightly 
different options but the same (equilibrated) starting point to compare run 
times. Not surprisingly for RF, it was much quicker to use just CPUs and forget 
about the GPU.

However, when I did some basic analysis of my results, I found that there was 
some surprising differences between the runs. I then added in a couple of PME 
runs to verify that it wasn't RF specific. Temp and pressure were set to 303K 
and 1 bar, both with Berendsen.

TemperaturePotential E.   Pressure
System nameDetails  AverageRMSDAverageRMSDAverage   
 RMSD
3.3.3 c_md RF nst5 group306.0   1.4-439029466 0.998 
 125
4.6 c_md   RF nst5 group303.9   1.4-440455461 0.0570
 126
4.6 c_vv   RF nst5 verlet   303.0   1.2-438718478 1.96  
 134
4.6 g_md   RF nst20 verlet  303.0   1.4-4393593193566   
 1139
4.6 g_vv   RF nst20 verlet  303.0   1.2-438635304834.3  
 405
4.6 c_pme  md nst5 group303.0   1.4-436138461 0.135 
 125
4.6 g_pme  md nst40 verlet  303.0   1.4-431621463 416   
 1016

Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv 
integrator, etc. Verlet  group refer to cut-off scheme and nst# refers to 
nstlist frequency which was automatically changed by gromacs. I found very 
similar results (and run times) for the GPU runs when -nb was set to gpu or 
gpu_cpu. The only other difference between runs is that in 3.3.3 only the 
bilayer was listed for comm_grps. In 4.6 I added the solvent due to a grompp 
warning, but I don't know how significant that is.

It looks like the thermostat in 4.6 is more effective than in 3.3.3. According 
to the 3.3.3 log file, the average temp of the bilayer and solvent were 302.0K 
and 307.6K respectively, whereas the difference between the two is much smaller 
in the 4.6 runs (1.3K for c_md and 0.2K for the rest). I don't know if this 
could be in any way related to the other discrepancies.

I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x 
RMSD). As it gave the best run time, this is the set-up I had hoped to use. I'm 
also surprised by how inaccurate the pressure calculations are and how large 
the RMSDs are for P.E. (RF only) and pressure (RF  PME) are when the GPU is 
used.

I then looked at the energies of step 0 in the log files and found that several 
of the reported energy types varied, which I would have expected to be 
identical (for RF+group) or similar (for Verlet or PME) to 3.3.3 as they are 
all continuations from the same starting point.

SystemLJ (SR)Coulomb (SR)Potential   Kinetic En.
Total EnergyTemperaturePressure (bar)
3.3.3 c_md1.80072E+04-4.30514E+05-4.38922E+056.14932E+04
-3.77429E+053.06083E+021.53992E+02
4.6 c_md  1.80072E+04-4.30515E+05-4.38922E+056.20484E+04
-3.76874E+053.08847E+021.56245E+02
4.6 c_vv  1.15784E+04-4.83639E+05-4.37388E+056.14748E+04
-3.75913E+053.05992E+02-1.40193E+03
4.6 g_md  0.0E+00 0.0E+00 3.46728E+046.14991E+04
9.61719E+043.06113E+02-1.70102E+04
4.6 g_vv  0.0E+00 0.0E+00 3.46728E+046.14748E+04
9.61476E+043.05992E+02-1.85758E+04
4.6 c_pme 1.30512E+04-3.37973E+05-4.35821E+056.14989E+04
-3.74322E+053.06112E+024.50028E+02
4.6 g_pme 1.76523E+04-4.89006E+05-4.31207E+056.14990E+04
-3.69708E+053.06112E+024.37951E+02

Even 4.6 c_md has a different K.E. and therefore T.E, temp  pressure! How is 
that possible? There seems to be something weird going on when you combine RF 
with GPUs and/or the Verlet cut-off scheme, resulting in temporarily positive 
energies and/or negative pressures. I don't know if this matters in the end, 
but I thought it was odd that it only happens for RF. Recalculating the 
averages to ignore the weird step 0 values made negligible difference. 

So 

Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options

2013-07-05 Thread David van der Spoel

On 2013-07-05 11:52, Cara Kreck wrote:

Sorry, the tables got all messed up. I've converted them to just text now:

From: cara_...@hotmail.com
To: gmx-users@gromacs.org
Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up options
Date: Fri, 5 Jul 2013 17:31:04 +0800







Hi everyone,

I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster 
node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit 
solvent and comparing it to previous results from 3.3.3. Firstly I wanted to 
check if the reported reaction field issues of 4.5 was fixed 
(http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and 
then I wanted to check which was the most efficient way to run. Since my 
simulation made it to 100ns without crashing, I'm hopeful that RF is no longer 
an issue. I then ran several shorter (4.5 ns) simulations with slightly 
different options but the same (equilibrated) starting point to compare run 
times. Not surprisingly for RF, it was much quicker to use just CPUs and forget 
about the GPU.

However, when I did some basic analysis of my results, I found that there was 
some surprising differences between the runs. I then added in a couple of PME 
runs to verify that it wasn't RF specific. Temp and pressure were set to 303K 
and 1 bar, both with Berendsen.

 TemperaturePotential E.   Pressure
System nameDetails  AverageRMSDAverageRMSDAverage   
 RMSD
3.3.3 c_md RF nst5 group306.0   1.4-439029466 0.998 
 125
4.6 c_md   RF nst5 group303.9   1.4-440455461 0.0570
 126
4.6 c_vv   RF nst5 verlet   303.0   1.2-438718478 1.96  
 134
4.6 g_md   RF nst20 verlet  303.0   1.4-4393593193566   
 1139
4.6 g_vv   RF nst20 verlet  303.0   1.2-438635304834.3  
 405
4.6 c_pme  md nst5 group303.0   1.4-436138461 0.135 
 125
4.6 g_pme  md nst40 verlet  303.0   1.4-431621463 416   
 1016

Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv 
integrator, etc. Verlet  group refer to cut-off scheme and nst# refers to 
nstlist frequency which was automatically changed by gromacs. I found very similar 
results (and run times) for the GPU runs when -nb was set to gpu or gpu_cpu. The 
only other difference between runs is that in 3.3.3 only the bilayer was listed for 
comm_grps. In 4.6 I added the solvent due to a grompp warning, but I don't know how 
significant that is.

It looks like the thermostat in 4.6 is more effective than in 3.3.3. According to 
the 3.3.3 log file, the average temp of the bilayer and solvent were 302.0K and 
307.6K respectively, whereas the difference between the two is much smaller in the 
4.6 runs (1.3K for c_md and 0.2K for the rest). I don't know if this could be 
in any way related to the other discrepancies.

I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x 
RMSD). As it gave the best run time, this is the set-up I had hoped to use. I'm 
also surprised by how inaccurate the pressure calculations are and how large the 
RMSDs are for P.E. (RF only) and pressure (RF  PME) are when the GPU is used.

I then looked at the energies of step 0 in the log files and found that several 
of the reported energy types varied, which I would have expected to be 
identical (for RF+group) or similar (for Verlet or PME) to 3.3.3 as they are 
all continuations from the same starting point.

SystemLJ (SR)Coulomb (SR)Potential   Kinetic En.
Total EnergyTemperaturePressure (bar)
3.3.3 c_md1.80072E+04-4.30514E+05-4.38922E+056.14932E+04
-3.77429E+053.06083E+021.53992E+02
4.6 c_md  1.80072E+04-4.30515E+05-4.38922E+056.20484E+04
-3.76874E+053.08847E+021.56245E+02
4.6 c_vv  1.15784E+04-4.83639E+05-4.37388E+056.14748E+04
-3.75913E+053.05992E+02-1.40193E+03
4.6 g_md  0.0E+00 0.0E+00 3.46728E+046.14991E+04
9.61719E+043.06113E+02-1.70102E+04
4.6 g_vv  0.0E+00 0.0E+00 3.46728E+046.14748E+04
9.61476E+043.05992E+02-1.85758E+04
4.6 c_pme 1.30512E+04-3.37973E+05-4.35821E+056.14989E+04
-3.74322E+053.06112E+024.50028E+02
4.6 g_pme 1.76523E+04-4.89006E+05-4.31207E+056.14990E+04
-3.69708E+053.06112E+024.37951E+02

Even 4.6 c_md has a different K.E. and therefore T.E, temp  pressure! How is 
that possible? There seems to be something weird going on when you combine RF with 
GPUs and/or the Verlet cut-off scheme, resulting in temporarily positive energies 
and/or negative pressures. I don't know if this matters in the end, but I thought 
it was odd that it only happens for RF. Recalculating the averages to ignore the 
weird step 0 

RE: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options

2013-07-05 Thread Cara Kreck
 Date: Fri, 5 Jul 2013 12:38:15 +0200
 From: sp...@xray.bmc.uu.se
 To: gmx-users@gromacs.org
 Subject: Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with  
 various set-up options
 
 On 2013-07-05 11:52, Cara Kreck wrote:
  Sorry, the tables got all messed up. I've converted them to just text now:
 
  From: cara_...@hotmail.com
  To: gmx-users@gromacs.org
  Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up 
  options
  Date: Fri, 5 Jul 2013 17:31:04 +0800
 
 
 
 
 
 
 
  Hi everyone,
 
  I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster 
  node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in 
  explicit solvent and comparing it to previous results from 3.3.3. Firstly I 
  wanted to check if the reported reaction field issues of 4.5 was fixed 
  (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and 
  then I wanted to check which was the most efficient way to run. Since my 
  simulation made it to 100ns without crashing, I'm hopeful that RF is no 
  longer an issue. I then ran several shorter (4.5 ns) simulations with 
  slightly different options but the same (equilibrated) starting point to 
  compare run times. Not surprisingly for RF, it was much quicker to use just 
  CPUs and forget about the GPU.
 
  However, when I did some basic analysis of my results, I found that there 
  was some surprising differences between the runs. I then added in a couple 
  of PME runs to verify that it wasn't RF specific. Temp and pressure were 
  set to 303K and 1 bar, both with Berendsen.
 
   TemperaturePotential E.   
  Pressure
  System nameDetails  AverageRMSDAverageRMSD
  AverageRMSD
  3.3.3 c_md RF nst5 group306.0   1.4-439029466 0.998 
   125
  4.6 c_md   RF nst5 group303.9   1.4-440455461 
  0.0570 126
  4.6 c_vv   RF nst5 verlet   303.0   1.2-438718478 1.96  
   134
  4.6 g_md   RF nst20 verlet  303.0   1.4-4393593193566   
   1139
  4.6 g_vv   RF nst20 verlet  303.0   1.2-438635304834.3  
   405
  4.6 c_pme  md nst5 group303.0   1.4-436138461 0.135 
   125
  4.6 g_pme  md nst40 verlet  303.0   1.4-431621463 416   
   1016
 
  Where c_md indicates CPU only and md integrator, g_vv indicates GPU and 
  md-vv integrator, etc. Verlet  group refer to cut-off scheme and nst# 
  refers to nstlist frequency which was automatically changed by gromacs. I 
  found very similar results (and run times) for the GPU runs when -nb was 
  set to gpu or gpu_cpu. The only other difference between runs is that in 
  3.3.3 only the bilayer was listed for comm_grps. In 4.6 I added the solvent 
  due to a grompp warning, but I don't know how significant that is.
 
  It looks like the thermostat in 4.6 is more effective than in 3.3.3. 
  According to the 3.3.3 log file, the average temp of the bilayer and 
  solvent were 302.0K and 307.6K respectively, whereas the difference between 
  the two is much smaller in the 4.6 runs (1.3K for c_md and 0.2K for the 
  rest). I don't know if this could be in any way related to the other 
  discrepancies.
 
  I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md 
  (~3x RMSD). As it gave the best run time, this is the set-up I had hoped to 
  use. I'm also surprised by how inaccurate the pressure calculations are and 
  how large the RMSDs are for P.E. (RF only) and pressure (RF  PME) are when 
  the GPU is used.
 
  I then looked at the energies of step 0 in the log files and found that 
  several of the reported energy types varied, which I would have expected to 
  be identical (for RF+group) or similar (for Verlet or PME) to 3.3.3 as they 
  are all continuations from the same starting point.
 
  SystemLJ (SR)Coulomb (SR)Potential   Kinetic En.
  Total EnergyTemperaturePressure (bar)
  3.3.3 c_md1.80072E+04-4.30514E+05-4.38922E+056.14932E+04
  -3.77429E+053.06083E+021.53992E+02
  4.6 c_md  1.80072E+04-4.30515E+05-4.38922E+056.20484E+04
  -3.76874E+053.08847E+021.56245E+02
  4.6 c_vv  1.15784E+04-4.83639E+05-4.37388E+056.14748E+04
  -3.75913E+053.05992E+02-1.40193E+03
  4.6 g_md  0.0E+00 0.0E+00 3.46728E+046.14991E+04
  9.61719E+043.06113E+02-1.70102E+04
  4.6 g_vv  0.0E+00 0.0E+00 3.46728E+046.14748E+04
  9.61476E+043.05992E+02-1.85758E+04
  4.6 c_pme 1.30512E+04-3.37973E+05-4.35821E+056.14989E+04
  -3.74322E+053.06112E+024.50028E+02
  4.6 g_pme 1.76523E+04-4.89006E+05-4.31207E+056.14990E+04
  -3.69708E+053.06112E+024.37951E+02
 
  Even 4.6 c_md has a different K.E

Re: [gmx-users] FW: Inconsistent results between 3.3.3 and 4.6 with various set-up options

2013-07-05 Thread Mark Abraham
On Fri, Jul 5, 2013 at 11:52 AM, Cara Kreck cara_...@hotmail.com wrote:
 Sorry, the tables got all messed up. I've converted them to just text now:

 From: cara_...@hotmail.com
 To: gmx-users@gromacs.org
 Subject: Inconsistent results between 3.3.3 and 4.6 with various set-up 
 options
 Date: Fri, 5 Jul 2013 17:31:04 +0800







 Hi everyone,

 I have been doing some tests and benchmarks of Gromacs 4.6 on a GPU cluster 
 node (with and without GPU) with a 128 lipid bilayer (G53A6L FF) in explicit 
 solvent and comparing it to previous results from 3.3.3. Firstly I wanted to 
 check if the reported reaction field issues of 4.5 was fixed 
 (http://gromacs.5086.x6.nabble.com/Reaction-Filed-crash-tp4390619.html) and 
 then I wanted to check which was the most efficient way to run. Since my 
 simulation made it to 100ns without crashing, I'm hopeful that RF is no 
 longer an issue. I then ran several shorter (4.5 ns) simulations with 
 slightly different options but the same (equilibrated) starting point to 
 compare run times. Not surprisingly for RF, it was much quicker to use just 
 CPUs and forget about the GPU.

 However, when I did some basic analysis of my results, I found that there was 
 some surprising differences between the runs. I then added in a couple of PME 
 runs to verify that it wasn't RF specific. Temp and pressure were set to 303K 
 and 1 bar, both with Berendsen.

 TemperaturePotential E.   Pressure
 System nameDetails  AverageRMSDAverageRMSDAverage 
RMSD
 3.3.3 c_md RF nst5 group306.0   1.4-439029466 0.998   
125
 4.6 c_md   RF nst5 group303.9   1.4-440455461 0.0570  
126
 4.6 c_vv   RF nst5 verlet   303.0   1.2-438718478 1.96
134
 4.6 g_md   RF nst20 verlet  303.0   1.4-4393593193566 
1139
 4.6 g_vv   RF nst20 verlet  303.0   1.2-438635304834.3
405
 4.6 c_pme  md nst5 group303.0   1.4-436138461 0.135   
125
 4.6 g_pme  md nst40 verlet  303.0   1.4-431621463 416 
1016

 Where c_md indicates CPU only and md integrator, g_vv indicates GPU and md-vv 
 integrator, etc. Verlet  group refer to cut-off scheme and nst# refers to 
 nstlist frequency which was automatically changed by gromacs. I found very 
 similar results (and run times) for the GPU runs when -nb was set to gpu or 
 gpu_cpu. The only other difference between runs is that in 3.3.3 only the 
 bilayer was listed for comm_grps. In 4.6 I added the solvent due to a grompp 
 warning, but I don't know how significant that is.

 It looks like the thermostat in 4.6 is more effective than in 3.3.3. 
 According to the 3.3.3 log file, the average temp of the bilayer and solvent 
 were 302.0K and 307.6K respectively, whereas the difference between the two 
 is much smaller in the 4.6 runs (1.3K for c_md and 0.2K for the rest). I 
 don't know if this could be in any way related to the other discrepancies.

I would say it shows quite clearly that the 3.3.3 RF regime had
significant cut-off based heating for all the reasons discussed in the
old RF thread you linked. There would seem to be some other effect
contributing to produce the temperature difference between rows 1 and
2 above. Using either Verlet lists or PME seems pretty good, though
;-)

 I am concerned about the P.E. difference between 3.3.3 c_md and 4.6 c_md (~3x 
 RMSD). As it gave the best run time, this is the set-up I had hoped to use.

I'd say the known poor quality of the 3.3.3 integration regime would
make this consideration insignificant.

 I'm also surprised by how inaccurate the pressure calculations are

One needs a lot of samples to measure something's value in the face of
RMSD three times its value. But perhaps nstcalcenergy  1 is making
such an observation artificially tougher :-)

 and how large the RMSDs are for P.E. (RF only) and pressure (RF  PME) are 
 when the GPU is used.

Your reports of bad performance, atypical RSMD, and zero potentials
below indicate something is badly astray. Please open an issue at
www.redmine.org and attach your .tpr and .log files (tarball,
preferably). If we can reproduce those numbers, then they need fixing.
But so far I'm not suspecting the code is the issue :-)

 I then looked at the energies of step 0 in the log files

Using mdrun -rerun is a much more sound method, because you guarantee
neighbour searching and no integration.

 and found that several of the reported energy types varied, which I would 
 have expected to be identical (for RF+group) or similar (for Verlet or PME) 
 to 3.3.3 as they are all continuations from the same starting point.

Switching integrator to vv does not make a valid comparison, because
the interpretation of the velocities is now different by half a time
step. There's no particular reason to suspect these PME and RF numbers
are