Hi,

I compared the .log file time accounting for same .tpr file run alone in serial or as part of an REMD simulation (with each replica on a single proessor). It ran about 5-10% slower in the latter. The effect was a bit larger when comparing the same .tpr on 8 processors with REMD with 8 processers per replica. The effect seems fairly independent of whether I compare the lowest or highest replica.

The system is 1ns of Ace-(Ala)_10-NME in CHARMM27 with GROMACS 4.5.3 using NVT, PME, virtual sites, 4fs timesteps, rlist=rvdw=rcoulomb=1.0nm with REMD ranging over 20 replicas distributed exponentially from 298K to 431.57K using v-rescale T-coupling. The machine has two quad-core processors per node with Inifiniband connection. The Infiniband switch is shared with other users' calculations, so some load-based variability can and does occur, but this should have shown up in a named part of the time accounting.

My first thought was that REMD exchange latency was to blame, so I quickly hacked in a change to report the length of time spent in the REMD initialization routine, and then each call to the REMD exchange-attempt routine.

Comparing the performance between REMD and serial of the lowest replica on a single processor, I saw with diff:
   Computing:         Nodes     Number     G-Cycles    Seconds     %
7394,7403c6910,6918
<  Vsite constr.          1     250001       40.271       13.8     0.7
<  Neighbor search        1      25011      434.982      148.7     7.1
<  Force                  1     250001     3607.375     1232.8    59.1
<  PME mesh               1     250001     1270.407      434.1    20.8
<  Vsite spread           1     500002       41.671       14.2     0.7
<  Write traj.            1          3        7.873        2.7     0.1
<  Update                 1     250001       82.822       28.3     1.4
<  Constraints            1     250001      154.231       52.7     2.5
<  REMD                   1        100       59.070       20.2     1.0
<  Rest                   1                 409.862      140.1     6.7
---
>  Vsite constr.          1     250001       40.526       13.8     0.7
>  Neighbor search        1      25001      434.871      148.6     7.5
>  Force                  1     250001     3601.463     1230.8    62.2
>  PME mesh               1     250001     1292.675      441.8    22.3
>  Vsite spread           1     500002       41.479       14.2     0.7
>  Write traj.            1          3       17.153        5.9     0.3
>  Update                 1     250001       82.114       28.1     1.4
>  Constraints            1     250001      154.426       52.8     2.7
>  Rest                   1                 122.023       41.7     2.1
7405c6920
<  Total                  1                6108.562     2087.5   100.0
---
>  Total                  1                5786.731     1977.5   100.0

So "Rest" goes up from 122 s to 409 s under REMD, even after factoring out the 59 s actually spent in REMD. With the highest replica:

   Computing:         Nodes     Number     G-Cycles    Seconds     %
7394,7403c6910,6918
<  Vsite constr.          1     250001       40.261       13.8     0.7
<  Neighbor search        1      25016      434.878      148.6     7.1
<  Force                  1     250001     3606.913     1232.6    59.0
<  PME mesh               1     250001     1264.716      432.2    20.7
<  Vsite spread           1     500002       41.268       14.1     0.7
<  Write traj.            1          3        7.113        2.4     0.1
<  Update                 1     250001       82.491       28.2     1.4
<  Constraints            1     250001      153.207       52.4     2.5
<  REMD                   1        100       60.272       20.6     1.0
<  Rest                   1                 417.399      142.6     6.8
---
>  Vsite constr.          1     250001       40.518       13.8     0.7
>  Neighbor search        1      25001      435.069      148.7     7.6
>  Force                  1     250001     3609.196     1233.4    62.6
>  PME mesh               1     250001     1283.082      438.5    22.3
>  Vsite spread           1     500002       41.825       14.3     0.7
>  Write traj.            1          3       13.063        4.5     0.2
>  Update                 1     250001       82.011       28.0     1.4
>  Constraints            1     250001      154.350       52.7     2.7
>  Rest                   1                 102.249       34.9     1.8
7405c6920
<  Total                  1                6108.520     2087.5   100.0
---
>  Total                  1                5761.363     1968.8   100.0

Here 102 s becomes 417 s despite factoring out 60 s for REMD. So the time spent doing the exchange is just noticeable, but quite a bit less than the observed increase in total time.

For the lowest replica in parallel:

8481,8496c7971,7985
<  Domain decomp.         8      25010      152.338       52.1     1.8
<  DD comm. load          8      24226        1.085        0.4     0.0
<  DD comm. bounds        8      24219        4.167        1.4     0.0
<  Vsite constr.          8     250001       62.857       21.5     0.8
<  Comm. coord.           8     250001      132.068       45.1     1.6
<  Neighbor search        8      25010      367.001      125.4     4.4
<  Force                  8     250001     3446.528     1177.8    41.2
<  Wait + Comm. F         8     250001      252.245       86.2     3.0
<  PME mesh               8     250001     2113.009      722.1    25.3
<  Vsite spread           8     500002      102.749       35.1     1.2
<  Write traj.            8          1        1.206        0.4     0.0
<  Update                 8     250001       85.793       29.3     1.0
<  Constraints            8     250001      464.294      158.7     5.5
<  Comm. energies         8     250002       73.343       25.1     0.9
<  REMD                   8        100      162.661       55.6     1.9
<  Rest                   8                 945.642      323.2    11.3
---
>  Domain decomp.         8      25001      146.561       50.1     2.0
>  DD comm. load          8      22943        0.989        0.3     0.0
>  DD comm. bounds        8      22901        3.768        1.3     0.1
>  Vsite constr.          8     250001       64.035       21.9     0.9
>  Comm. coord.           8     250001      124.487       42.5     1.7
>  Neighbor search        8      25001      367.342      125.5     5.0
>  Force                  8     250001     3443.161     1176.7    46.9
>  Wait + Comm. F         8     250001      237.697       81.2     3.2
>  PME mesh               8     250001     2119.205      724.2    28.9
>  Vsite spread           8     500002       95.092       32.5     1.3
>  Write traj.            8          1        0.920        0.3     0.0
>  Update                 8     250001       85.529       29.2     1.2
>  Constraints            8     250001      391.469      133.8     5.3
>  Comm. energies         8     250002      120.291       41.1     1.6
>  Rest                   8                 139.127       47.5     1.9
8498c7987
<  Total                  8                8366.984     2859.3   100.0
---
>  Total                  8                7339.674     2508.3   100.0

Again REMD exchanges are only a small fraction of the increase (139 s to 946 s despite 163 s accounted for).

Does anyone have a theory on what could be causing this?

Mark

--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to