[MPB-discuss] mpb-mpi slows down with increasing number of nodes (SMP)

Matt Eichenfield Thu, 07 May 2009 12:26:53 -0700

Hi all,

I am running 64 bit Ubuntu 9.04 (Jaunty) on a Mac Pro with two quad-core
intel Xeon processors, 16 GB of RAM.  The computer is set up to dual-boot
(not running Ubuntu in a VM or anything).  My mpb/mpb-mpi version is 1.4.2,
retrieved from the ubuntu package manager; my mpich version is 1.2.7-9;
fftw2 is 2.1.3-22; liblapack is 3.1.1-6...I'm not sure how many of these
things are necessary to know.  Everything was retreived from the Ubuntu
Jaunty package manager.  Basically, as soon as the operating system was
installed, I typed "apt-get mpi" and "apt-get mpb-mpi" and started running
(this would be absolutely beautiful if the parallel version ran correctly,
and really it's still beautiful considering the serial version does work
just this easily...MEEP, too!).


For a simple 3d script "foo.ctl" (that returns the correct frequencies and
fields every one of the following times, based on simulations run on serial
MPB code on multiple machines), I get the following performance.

When solving for a single band of a single k-point, mpb serial code uses
840MB RAM and solves the k-point in 18s.   Running the same ctl file with
"mpirun -np X mpb-mpi foo.ctl" performs more slowly and requires more RAM
with increasing number of nodes.  For X = 1, 2, 4, 8, 16 (goes up to 16
because each processor can handle 2 threads):  RAM usage (GB) = 0.840, 1.0,
1.5, 2.3, 4.1; time to solve k-point (seconds) = 20, 21, 22, 30, 48.  The
scaling of the RAM fits an essentially perfect line with RAM_total = 220MB
per node*np+ 600 MB.  Thus it appears that the RAM/processor is independent
of the number of processors (with a 600 MB offset in RAM for the operating
system), whereas it should be dividing the RAM necessary for a single
processor amongst all the nodes.  The fact that the solution time inreases
with increasing number of nodes obvioulsy does not make any sense either.
This scaling behaves analogoulsy regardless of the number of k-points or
bands solved for per k-point.

In addition, the mpb-mpi output contains some weirdness that I wouldn't
necessarily expect.  In particular, the iterations of the calculations
return separately for each node.  Since I have never run mpb-mpi before, I
figured this could just be the way it does it.  However, given the time/RAM
scaling, I thought this could be an indication that the program is actually
just running identical copies of the single solver on every node.  For
instance, when running with "np 2"

********************************************
solve_kpoint (0.166667,0,0):
Solving for bands 1 to 1...
    iteration    3: trace = 0.0197640759208349 (3.27591% change)
solve_kpoint (0.166667,0,0):
Solving for bands 1 to 1...
    iteration    3: trace = 0.0197640759208349 (3.27591% change)
    iteration    6: trace = 0.01967655636958654 (0.00696769% change)
    iteration    6: trace = 0.01967655636958654 (0.00696769% change)
    iteration   10: trace = 0.01967598583251337 (1.6045e-05% change)
    iteration   10: trace = 0.01967598583251337 (1.6045e-05% change)
Finished solving for bands 1 to 1 after 11 iterations.
zevenyoddfreqs:, 2, 0.166667, 0, 0, 0.166667, 0.140271
elapsed time for k point: 16 seconds.
Finished solving for bands 1 to 1 after 11 iterations.
zevenyoddfreqs:, 2, 0.166667, 0, 0, 0.166667, 0.140271
elapsed time for k point: 16 seconds.
********************************************

This scales up with "np", producing np iteration and frequency lines in the
output.  Also, sometimes these iterations returns get out of sync for large
np, but the fields and frequencies always come out correctly regardless.  I
can't tell how many times it outputs the fields or epsilon, as they would
all overwrite each other.  Like I said, the fields and frequencies it
returns are correct; it's just the scaling that doesn't make sense.

The runs always contain this in the output as well

********************************************
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE).  You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
p0_14367:  p4_error: interrupt SIGx: 15

Some deprecated features have been used.  Set the environment
variable GUILE_WARN_DEPRECATED to "detailed" and rerun the
program to get more information.  Set it to "no" to suppress
this message.
********************************************

In addition, mpb-split does not appear to work as indicated in the wiki
either.  When running "mpb-split num-split foo.ctl", it also looks as though
it just does every calculation num-split times and overwrites the files as
it goes.

Any help you could offer would be much appreciated, as these errors relegate
us to using the serial code only.

Thanks very much,
Matt Eichenfield
[email protected]

_______________________________________________
mpb-discuss mailing list
[email protected]
http://ab-initio.mit.edu/cgi-bin/mailman/listinfo/mpb-discuss

[MPB-discuss] mpb-mpi slows down with increasing number of nodes (SMP)

Reply via email to