Which version of Open MPI are you using ? We can figure out what's wrong if we have the output of "ompi_info" and "ompi_info --param all all".

I wonder if some of the memory is not related to the size of the shared memory file. The default way to compute the size of the shared memory file is defined by the MCA parameter mpool_sm_per_peer_size. By default is set to 128MB for each local peer. Therefore using 2048 procs on 256 nodes lead to using 8 procs by node i.e. at least 1GB only for the SM file. The problem right now with the SM file is that we're not reusing the buffers multiple times, instead we're using a new fragment each time we send a message, forcing the OS to map the entire file at one point.

  george.

On Nov 27, 2006, at 8:21 PM, Matt Leininger wrote:

On Mon, 2006-11-27 at 16:45 -0800, Matt Leininger wrote:
Has anyone testing OMPI's alltoall at > 2000 MPI tasks? I'm seeing each
MPI task eat up > 1GB of memory (just for OMPI - not the app).

  I gathered some more data using the alltoall benchmark in mpiBench.
mpiBench is pretty smart about how large its buffers are.  I set it to
use <= 100MB.

 num nodes        num MPI tasks   system mem      mpibench buffer mem
   128               1024          1   GB              65 MB
   160               1280          1.2 GB              82 MB
   192               1536          1.4 GB              98 MB
   224               1792          1.6 GB              57 MB
   256               2048          1.6-1.8 GB           < 100 MB

The 256 node run was killed by the OOM for using too much memory.  For
all these tests the OMPI alltoall is using 1 GB or more of system
memory.  I know LANL is looking into optimized alltoall, but is anyone
looking into the scalability of the memory footprint?

  Thanks,

  - Matt


 Thanks,

        - Matt



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to