Hi Markus There are two MCA params that can help you, I believe:
1. You to set the maximum size of the shared memory file with -mca mpool_sm_max_size xxx where xxx is the maximum memory file you want, expressed in bytes. The default value I see is 512MBytes. 2. You can set the size/peer of the file, again in bytes: -mca mpool_sm_per_peer_size xxx This will allocate a file that is xxx * num_procs_on_the_node on each node, up to the maximum file size (either the 512MB default or whatever you specified using the previous param). This defaults to 32MBytes/proc. I see that there is also a minimum (total, not per-proc) file size that defaults to 128MBytes. If that is still too large, you can adjust it using -mca mpool_sm_min_size yyy Hope that helps Ralph On 6/10/07 2:55 PM, "Markus Daene" <markus.da...@physik.uni-halle.de> wrote: > Dear all, > > I hope I am in the correct mailing list with my problem. > I try to run openmpi with the gridengine(6.0u10, 6.1). Therefore I > compiled openmpi (1.2.2), > which has the gridengine support included, I have checked it with ompi_info. > In principle, openmpi runs well. > The gridengine is configured such that the user has to specify the > memory consumption > via the h_vmem option. Then I noticed that with a larger number of > processes the job > is killed by the gridengine because of taking too much memory. > To take a closer look on that, I wrote a small and simple (Fortran) MPI > program which has just a MPI_Init > and a (static) array, in my case of 50MB, then the programm goes into a > (infinite) loop, because it > takes some time until the gridengine reports the maxvmem. > I found, that if the processes run all on different nodes, there is only > a offset per process, at least > a linear scaling. But it becomes worse when the jobs run on one node. > There it seems to be a quadratic > scaling with the offset, in my case about 30M. I made a list of the > virtual memory reported by the > gridengine, I was running on a 16 processor node: > > #N proc virt. Mem[MB] > 1 182 > 2 468 > 3 825 > 4 1065 > 5 1001 > 6 1378 > 7 1817 > 8 2303 > 12 4927 > 16 8559 > > the pure program should need N*50MB, for 16 it is only 800M, but it > takes 10 times more, >7GB!!! > Of course, the gridengine will kills the job is this overtaking is not > taken into accout, > because of too much virtual memory consumption. The momory consumtion is > not related to the grid engine, > it is the same if I run from the command line. > I guess it might be related to the 'sm' component of the btl. > Is it possible to avoid the quadratic scaling? > Of course I could use the vapi/tcp component only like > mpirun --mca btl mvapi -np 16 ./my_test_program > in this case the virtual memory is fine, but it will not be what one > wants on a smp node. > > > then it becomes ever worse: > openmpi nicely report the (max./act.) used virtual memory to the grid > engine as sum of all processes. > This value is the compared with the one the user has specified with the > h_vmem option, but the > gridengine takes this value per process for the allocation of the job > (works) and does not multiply > this with the number of processes. Maybe one should report this to the > gridenging mailing list, but it > could be related as well for the openmpi interface. > > The last thing I noticed: > It seems that if the v_mem option for gridengine jobs is specified like > '2.0G' my test job was > immedialtely killed; but when I specify '2000M' (which is obviously > less) it work. The gridengine > puts the job allways on the correct node as requested, but I think there > is might be a problem in > the openmpi interface. > > > It would be nice if someone could give some hints how to avoid the > quadratic scaling or maybe to think > if this is really neccessary in openmpi. > > > Thanks. > Markus Daene > > > > > my compiling options: > ./configure --prefix=/not_important --enable-static > --with-f90-size=medium --with-f90-max-array-dim=7 --with-mpi-para > m-check=always --enable-cxx-exceptions --with-mvapi > --enable-mca-no-build=btl-tcp > > ompi_info output: > Open MPI: 1.2.2 > Open MPI SVN revision: r14613 > Open RTE: 1.2.2 > Open RTE SVN revision: r14613 > OPAL: 1.2.2 > OPAL SVN revision: r14613 > Prefix: /usrurz/openmpi/1.2.2/pathscale_3.0 > Configured architecture: x86_64-unknown-linux-gnu > Configured by: root > Configured on: Mon Jun 4 16:04:38 CEST 2007 > Configure host: GE1N01 > Built by: root > Built on: Mon Jun 4 16:09:37 CEST 2007 > Built host: GE1N01 > C bindings: yes > C++ bindings: yes > Fortran77 bindings: yes (all) > Fortran90 bindings: yes > Fortran90 bindings size: small > C compiler: pathcc > C compiler absolute: /usrurz/pathscale/bin/pathcc > C++ compiler: pathCC > C++ compiler absolute: /usrurz/pathscale/bin/pathCC > Fortran77 compiler: pathf90 > Fortran77 compiler abs: /usrurz/pathscale/bin/pathf90 > Fortran90 compiler: pathf90 > Fortran90 compiler abs: /usrurz/pathscale/bin/pathf90 > C profiling: yes > C++ profiling: yes > Fortran77 profiling: yes > Fortran90 profiling: yes > C++ exceptions: yes > Thread support: posix (mpi: no, progress: no) > Internal debug support: no > MPI parameter check: always > Memory profiling support: no > Memory debugging support: no > libltdl support: yes > Heterogeneous support: yes > mpirun default --prefix: no > MCA backtrace: execinfo (MCA v1.0, API v1.0, Component v1.2.2) > MCA memory: ptmalloc2 (MCA v1.0, API v1.0, Component v1.2.2) > MCA paffinity: linux (MCA v1.0, API v1.0, Component v1.2.2) > MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.2) > MCA maffinity: libnuma (MCA v1.0, API v1.0, Component v1.2.2) > MCA timer: linux (MCA v1.0, API v1.0, Component v1.2.2) > MCA installdirs: env (MCA v1.0, API v1.0, Component v1.2.2) > MCA installdirs: config (MCA v1.0, API v1.0, Component v1.2.2) > MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0) > MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) > MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.2) > MCA coll: self (MCA v1.0, API v1.0, Component v1.2.2) > MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.2) > MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.2) > MCA io: romio (MCA v1.0, API v1.0, Component v1.2.2) > MCA mpool: rdma (MCA v1.0, API v1.0, Component v1.2.2) > MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.2) > MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.2) > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.2) > MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.2) > MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.2) > MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.2) > MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.2) > MCA btl: mvapi (MCA v1.0, API v1.0.1, Component v1.2.2) > MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.2) > MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.2) > MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.2) > MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.2) > MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.2) > MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.2) > MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.2) > MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.2) > MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.2) > MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.2) > MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.2) > MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.2) > MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0) > MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.2) > MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.2) > MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.2) > MCA ras: slurm (MCA v1.0, API v1.3, Component v1.2.2) > MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.2) > MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.2) > MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.2) > MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.2) > MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.2) > MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.2) > MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.2) > MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.2) > MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.2) > MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.2) > MCA pls: slurm (MCA v1.0, API v1.3, Component v1.2.2) > MCA sds: env (MCA v1.0, API v1.0, Component v1.2.2) > MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.2) > MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.2) > MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.2) > MCA sds: slurm (MCA v1.0, API v1.0, Component v1.2.2) > > ---------------------------------------------------------- > Markus Daene > Martin Luther University Halle-Wittenberg > Naturwissenschaftliche Fakultaet II > Institute of Physics > Von Seckendorff-Platz 1 (room 1.28) > 06120 Halle > Germany > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel