Re: [NMusers] Simple parallel benchmark for Nonmem 7.2 with large Bayes problem

Ron Keizer Sat, 21 May 2011 03:27:18 -0700

Dieter,

the observation that the 8-core run is slower than the 4-core run isprobably not due CPU hyperthreading, as you suggest. The CPU loads thatyou report also suggest otherwise. I agree with Mark that it is morelikely due to the short time per iteration, i.e. the relatively highamount of overhead compared to the actual calculations. We noticed thesame when using FPI. Use MPI or test a slower model and this effect willprobably disappear.

We also did some benchmarking, and noticed that NM7.2 can do prettyefficient parallelization. Our conclusions:

- MPI is much more efficient than FPI, especially for faster problems

- The efficiency with MPI seems to hold across estimation methods (FOCE/ BAYES / SAEM) and models (8 tested), around 90% when using 5 cores.See results below.- Parallelization efficiency depends on e.g. time per iteration,transfer type, number of individuals in dataset.- parallelization (MPI) was still efficient at higher numbers of cores.We tested up to 7 cores on 1 machine. In some basic tests, performanceover network-nodes seemed as good as when running on a single machine,although fair benchmarking is difficult on a production cluster.

We tested using the gfortran compiler, on a dedicated 8-core machinerunning Linux.


best regards,
Ron

--
-----------------------------------
Ron Keizer, PharmD PhD
Post-doctoral fellow
Pharmacometrics Research Group
Uppsala University
-----------------------------------



table1: multicore efficiency
| tt  | n cores | time_FOCE |   % | time_BAYES |   % |
|-----+---------+-----------+-----+------------+-----|
| -   |       1 |  13462.69 | 100 |    5283.78 | 100 |
| FPI |       2 |   7269.35 |  54 |    3096.51 |  58 |
| FPI |       3 |   5081.05 |  38 |    2470.52 |  46 |
| FPI |       4 |   4211.93 |  31 |    2709.43 |  51 |
| FPI |       5 |   3667.43 |  27 |     2729.8 |  51 |
| FPI |       6 |   3464.34 |  26 |    3254.91 |  61 |
|-----+---------+-----------+-----+------------+-----|
| -   |       1 |  13462.69 | 100 |    5283.78 | 100 |
| MPI |       2 |   7122.48 |  53 |    2731.38 |  51 |
| MPI |       3 |   4826.77 |  36 |    1853.94 |  35 |
| MPI |       4 |   3705.35 |  28 |    1464.69 |  27 |
| MPI |       5 |   2976.36 |  22 |    1179.11 |  22 |
| MPI |       6 |   2519.89 |  19 |    1011.94 |  19 |

table 2: efficiency across different models (distributed to 5 cores, tin sec)

| mo | model  | est   | n_ind | iter |   t_orig |  t_mpi5 |    t% |  eff% |
|----+--------+-------+-------+------+----------+---------+-------+-------|
| M1 | ADVAN6 | FOCEI |     9 |   16 |   5863.0 | 1881.88 |  32.1 | 62.31 |
| M2 | ADVAN6 | FOCEI |   454 |   28 |   4485.3 |  930.38 | 20.74 | 96.42 |
| M3 | ADVAN6 | FOCEI |   412 |   20 |   363.84 |   78.23 |  21.5 | 93.02 |
| M4 | ADVAN6 | FOCE  |   105 |  486 | 13616.83 | 2979.52 | 21.88 |  91.4 |
| M5 | ADVAN6 | FOCEI |    42 |   45 | 14183.92 | 3167.56 | 22.33 | 89.56 |
| M6 | ADVAN6 | FOCEI |    39 |   43 |  4698.34 |  992.52 | 21.12 | 94.67 |
| M7 | ADVAN6 | FOCE  |   100 |   29 |    33249 | 7493.82 | 22.54 | 88.74 |


On 5/20/11 9:36 PM, Dieter Menne wrote:

Here some quick-and-dirty results of my first benchmark with parallel
processing in NONMEM 7.2

Running Win7, 64 bit, intel i7, with 4 CPU (and 4 hyperthreading cores). One
computer only.

Using file message passing. Could not get mpi to work in this configuration.

call nmfe72 mtl_KPreM2Pre_T2L2_.ctl -parafile=fpiwini8.pnm [nodes]= (1 or 4
or 8)

10 iterations of a very large Bayes problem (which should not profit from
multiple cores, according to the manual)

nodes    time
1        45 s
4        25 s
8        40 s

So about a factor of 2 between 1 and 4 cores.

It is not surprising that 8 gives worse values because these are no real
CPUs. More surprising is the fact that with 8 "CPU", I have 100 load on all
of them (huh?), while with 4 CPUs, I have the expected 50%.

Dieter

Re: [NMusers] Simple parallel benchmark for Nonmem 7.2 with large Bayes problem

Reply via email to