Re: [Meep-discuss] meep-mpi scaling perf with more than 2 processors

Nizamov Shawkat Thu, 15 Jul 2010 11:15:09 -0700

1) You didn't provide any details on the layout of your cluster. It is
hard to guess if you have a  several dual-cores with 16Gb memory, or
they are 6 cores and you are running only on one of them.
2) I resemble that using 4 (or was it 8?) core Athlon in single PC
there were no acceleration beyond 3 cores. In my case the limitation
was most probably memory bandwidth, but it is not  your case. I would
anyway see in my case all cores running at almost 100%.
3) Are you sure that you are accounting for actual simulation and not
the initialization? Populating the memory with epsilons is not
parallel, I mean, that every core populates only some simulation
space. If it is uniformly filled it completes fast. If it has some
structure, especially if subpixel averaging is turned on,  it may take
much longer time, during which other cores will just simply wait.
Print some debug information like "structure initialization"
"simulation started" etc and compare the timing distribution. From
"runtime0=2" I conclude that your simulation time is actually rather
short.


With best regards,
Shawkat Nizamov

2010/7/15, [email protected] <[email protected]>:
> Dear Meep users and developer,
>
> I'm getting strange scaling performance using meep-mpi compiled with
> IntelMPI on our cluster. When I go from 1 to 2 processors, I'm getting
> an almost ideal scaling (i.e. runtime is divided by almost 2 as shown
> below for various problem sizes), but the scaling becomes very weak
> when using more than 2 processors. I should say that meep-mpi results
> agree with the one I am getting on my PC with meep-serial (in other
> words, our compilation seems all right).
>
> nb_proc  runtime-res=20   runtime-res=40     runtime-res=60  runtime-res=80
>      1          20.5             135                449             1086
>      2          11.47             73                230              551
>      4          11.52             68                219              530
>      8          12.9              67                222              528
>
> Let's go for some more details with a job size of ~3Gb (3D stuff). I
> am showing below the stats obtained when requesting 4 processors:
> mpirun -np 4 meep-mpi res=100 runtime0=2 norm-run?=true slab3D.ctl
>
> -------------------------------------------------------------------------
> Mem:  16411088k total,  4015216k used, 12395872k free,      256k buffers
> Swap:        0k total,        0k used,        0k free,   283692k cached
>      PID    PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+    P  COMMAND
> 18175    25   0  353m 221m 6080 R  99.8  1.4   1:10.41  1  meep-mpi
> 18174    25   0  354m 222m 6388 R 100.2  1.4   1:10.41  6  meep-mpi
> 18172    25   0 1140m 1.0g 7016 R  99.8  6.3   1:10.41  2  meep-mpi
> 18173    25   0 1140m 1.0g 6804 R  99.5  6.3   1:10.40  4  meep-mpi
>
> Tasks: 228 total,   5 running, 222 sleeping,   0 stopped,   1 zombie
> Cpu1  : 23.9%us, 76.1%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu6  : 23.3%us, 76.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu2  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> Cpu4  : 99.7%us,  0.3%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,
> 0.0%st
> [...]
> -------------------------------------------------------------------------
>
> So what we see here is that while the processors are all running flat
> out, for CPU 1 and 6 (which are the two running processes light on
> memory) only 1/4 of the time is in user code, and 3/4 is in system
> time -- normally I/O, but here probably MPI communications. It
> explains why I don't get shorter runtimes with more than 2 processors.
>
> So we have a fairly clear load-balance issue; Have you experienced
> this kind of situation? I was wondering if there may be meep-mpi
> parameters I can set to affect the domain decomposition into chunks in
> a helpful way.
>
> I can send more details if needed.
>
> Thanks in advance!
>
> Best regards,
>
> Guillaume Demésy
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
>
> _______________________________________________
> meep-discuss mailing list
> [email protected]
> http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/meep-discuss
>

_______________________________________________
meep-discuss mailing list
[email protected]
http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/meep-discuss

Re: [Meep-discuss] meep-mpi scaling perf with more than 2 processors

Reply via email to