Am Dienstag, den 30.08.2011, 12:34 -0600 schrieb John Peterson: > On Tue, Aug 30, 2011 at 12:23 PM, robert <[email protected]> wrote: > > > >> 32 nodes or 32 cores? I don't know the details of your cluster so it > >> may be obvious, but make sure you aren't accidentally running too many > >> MPI processes on a given node. > >> > > As far as I understood it it is: > > > > 1 node = 4cores > > > > 4GB/node > > This doesn't match the output of the top command you posted below. > The total memory given there is 31 985 140 kilobytes = 30.5034065 > gigabytes. > > Does the cluster you are on have a public information web page? That > would probably help clear things up... > > > > > For testing and learning I only used a partition of 32 nodes. > > I have just changed to 128 nodes but this doesn't change anything. > > > > > > If I am running into swap and I use --enable-parmesh this wouldn't > > change much, (since I have one copy of the mesh per mpi-process), right? > > The idea would be to run fewer processes per node. For example, you > could run 1 MPI process each on 128 different nodes, then each of the > individual processes would have access to the full amount of RAM for > the node. The method for doing this is again cluster dependent; I > don't know if it's possible on your particular cluster. > It is possible to run 1, 2 or 4 processes per node. If I run 2 or 4 processes I get:
Error! ***Memory allocation failed for SetUpCoarseGraph: gdata. Requested size: 107754020 bytesError! ***Memory allocation failed for SetUpCoarseGraph: gdata. Requested size: 107754020 bytesError! For 1 process it works but very, very slowly > > > > top - 20:19:21 up 35 days, 8:55, 51 users, load average: 0.01, 0.29, > > 0.45 > > Tasks: 399 total, 1 running, 397 sleeping, 1 stopped, 0 zombie > > Cpu(s): 0.0%us, 0.2%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, > > 0.0%st > > Mem: 31985140k total, 31158420k used, 826720k free, 274980k buffers > > Swap: 8393952k total, 160k used, 8393792k free, 16572876k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ > > COMMAND > > 2955 bodner 16 0 3392 1932 1244 R 1 0.0 0:00.69 > > top > > 6602 bodner 15 0 14296 3248 1864 S 0 0.0 0:10.11 > > sshd > > 2829 bodner 15 0 19604 3892 3092 S 0 0.0 0:00.17 mpirun > > > > The last one is the process of interest. > > Actually none of these are interesting... we would need to see that > actual processes that mpirun spawned. That is, if you ran something > like > > mpirun -np 4 ./foo > > You would need to look for the four instances of "foo" in the top > command, see how much CPU/memory they are consuming. > ------------------------------------------------------------------------------ Special Offer -- Download ArcSight Logger for FREE! Finally, a world-class log management solution at an even better price-free! And you'll get a free "Love Thy Logs" t-shirt when you download Logger. Secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsisghtdev2dev _______________________________________________ Libmesh-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/libmesh-users
