Am Dienstag, den 30.08.2011, 12:34 -0600 schrieb John Peterson:
> On Tue, Aug 30, 2011 at 12:23 PM, robert <[email protected]> wrote:
> >
> >> 32 nodes or 32 cores?  I don't know the details of your cluster so it
> >> may be obvious, but make sure you aren't accidentally running too many
> >> MPI processes on a given node.
> >>
> > As far as I understood it it is:
> >
> > 1 node = 4cores
> >
> > 4GB/node
> 
> This doesn't match the output of the top command you posted below.
> The total memory given there is 31 985 140 kilobytes = 30.5034065
> gigabytes.
> 
> Does the cluster you are on have a public information web page?  That
> would probably help clear things up...
> 
> 
> 
> > For testing and learning I only used a partition of 32 nodes.
> > I have just changed to 128 nodes but this doesn't change anything.
> >
> >
> > If I am running into swap and I use --enable-parmesh this wouldn't
> > change much, (since I have one copy of the mesh per mpi-process), right?
> 
> The idea would be to run fewer processes per node.  For example, you
> could run 1 MPI process each on 128 different nodes, then each of the
> individual processes would have access to the full amount of RAM for
> the node.  The method for doing this is again cluster dependent; I
> don't know if it's possible on your particular cluster.
> 
It is possible to run 1, 2 or 4 processes per node. If I run 2 or 4 processes I 
get: 

Error! ***Memory allocation failed for SetUpCoarseGraph: gdata.
Requested size: 107754020 bytesError! ***Memory allocation failed for
SetUpCoarseGraph: gdata. Requested size: 107754020 bytesError!

For 1 process it works but very, very slowly


> 
> 
> > top - 20:19:21 up 35 days,  8:55, 51 users,  load average: 0.01, 0.29,
> > 0.45
> > Tasks: 399 total,   1 running, 397 sleeping,   1 stopped,   0 zombie
> > Cpu(s):  0.0%us,  0.2%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si,
> > 0.0%st
> > Mem:  31985140k total, 31158420k used,   826720k free,   274980k buffers
> > Swap:  8393952k total,      160k used,  8393792k free, 16572876k cached
> >
> >  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
> > COMMAND
> >  2955 bodner    16 0 3392 1932 1244 R    1  0.0   0:00.69
> > top
> >  6602 bodner    15   0 14296 3248 1864 S    0  0.0   0:10.11
> > sshd
> >  2829 bodner    15   0 19604 3892 3092 S    0  0.0   0:00.17  mpirun
> >
> > The last one is the process of interest.
> 
> Actually none of these are interesting... we would need to see that
> actual processes that mpirun spawned.  That is, if you ran something
> like
> 
> mpirun -np 4 ./foo
> 
> You would need to look for the four instances of "foo" in the top
> command, see how much CPU/memory they are consuming.
> 



------------------------------------------------------------------------------
Special Offer -- Download ArcSight Logger for FREE!
Finally, a world-class log management solution at an even better 
price-free! And you'll get a free "Love Thy Logs" t-shirt when you
download Logger. Secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsisghtdev2dev
_______________________________________________
Libmesh-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/libmesh-users

Reply via email to