Rolf, I think it is not a good idea to increase the default value to 2G. You have to keep in mind that there are not so many people who have a machine with 128 and more cores on a single node. The average people will have nodes with 2,4 maybe 8 cores and therefore it is not necessary to set this parameter to such a high value. Eventually it allocates all of this memory per node, and if you have only 4 or 8G per node it will be inbalanced. For my 8core nodes I have even decreased the sm_max_size to 32G and I had no problems with that. As far as I know (if not otherwise specified during runtime) this parameter is global. So even if you run on your machine with 2 procs it might allocate the 2G for the MPI smp module. I would recommend like Richard suggests to set the parameter for your machine in etc/openmpi-mca-params.conf and not to change the default value.
Markus Rolf vandeVaart wrote: > We are running into a problem when running on one of our larger SMPs > using the latest Open MPI v1.2 branch. We are trying to run a job > with np=128 within a single node. We are seeing the following error: > > "SM failed to send message due to shortage of shared memory." > > We then increased the allowable maximum size of the shared segment to > 2Gigabytes-1 which is the maximum allowed on 32-bit application. We > used the mca parameter to increase it as shown here. > > -mca mpool_sm_max_size 2147483647 > > This allowed the program to run to completion. Therefore, we would > like to increase the default maximum from 512Mbytes to 2G-1 Gigabytes. > Does anyone have an objection to this change? Soon we are going to > have larger CPU counts and would like to increase the odds that things > work "out of the box" on these large SMPs. > > On a side note, I did a quick comparison of the shared memory needs of > the old Sun ClusterTools to Open MPI and came up with this table. > > Open MPI > np Sun ClusterTools 6 current suggested > ----------------------------------------------------------------- > 2 20M 128M 128M > 4 20M 128M 128M > 8 22M 256M 256M > 16 27M 512M 512M > 32 48M 512M 1G > 64 133M 512M 2G-1 > 128 476M 512M 2G-1 > > _______________________________________________ > devel mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
