Arrgh -- if only the Linux kernel community had accepted ummunotify, this would now be a moot point (i.e., the argument would be solely with the OS/glibc, not the MPI!).
On Jan 9, 2010, at 10:45 PM, Barrett, Brian W wrote: > We should absolutely not change this. For simple applications, yes, things > work if large blocks are allocated on the heap. However, ptmalloc (and most > allocators, really), can't rationally cope with repeated allocations and > deallocations of large blocks. It would be *really bad* (as we've seen > before) to change the behavior of our version of ptmalloc from that which is > provided by Linux. Pain and suffering is all that path has ever lead to. > > Just my $0.02, of course. > > Brian > > ________________________________________ > From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] On Behalf Of > Eugene Loh [eugene....@sun.com] > Sent: Saturday, January 09, 2010 9:55 AM > To: Open MPI Developers > Subject: Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD) > > Jeff Squyres wrote: > > >I'm not sure I follow -- are you saying that Open MPI is disabling the large > >mmap allocations, and we shouldn't? > > > > > Basically the reverse. The default (I think this means Linux, whether > with gcc, gfortran, Sun f90, etc.) is to use mmap to malloc large > allocations. We don't change this, but arguably we should. > > Try this: > > #include <stdlib.h> > #include <stdio.h> > > int main(int argc, char **argv) { > size_t size, nextsize; > void *ptr, *nextptr; > > size = 1; > ptr = malloc(size); > while ( size < 1000000 ) { > nextsize = 1.1 * size + 1; > nextptr = malloc(nextsize); > printf("%9ld %18lx %18lx %18lx\n", size, size, nextptr - ptr, ptr); > size = nextsize; > ptr = nextptr ; > } > > return 0; > } > > Here is sample output: > > # bytes #bytes (hex) #bytes ptr (hex) > to next ptr > (hex) > > 58279 e3a7 e3b0 58f870 > 64107 fa6b fa80 59dc20 > 70518 11376 11380 5ad6a0 > 77570 12f02 12f10 5bea20 > 85328 14d50 14d60 5d1930 > 93861 16ea5 16eb0 5e6690 > 103248 19350 19360 5fd540 > 113573 1bba5 1bbb0 6168a0 > 124931 1e803 2b3044655bc0 632450 > 137425 218d1 22000 2b3044c88010 > 151168 24e80 25000 2b3044caa010 > 166285 2898d 29000 2b3044ccf010 > 182914 2ca82 2d000 2b3044cf8010 > 201206 311f6 294000 2b3044d25010 > 221327 3608f 37000 2b3044fb9010 > 243460 3b704 3c000 2b3044ff0010 > > So, below 128K allocations, pointers are allocated at successively > higher addresses, each one just barely far enough to make room for the > allocation. E.g., an allocation of 0xE3A7 will push the "high-water > mark" up 0xE3B0 further. > > Beyond 128K allocations, allocations are page aligned. The pointers all > end in 0x010. That is, whole numbers of pages are allocated and the > returned address is 16 bytes (0x10) into the first page. The size of > the allocations are the requested amount, plus a few bytes of padding, > rounded up to the nearest whole page size multiple. > > The motivation to change, in my case, is performance. I don't know how > widespread this problem is, but... > > >On Jan 8, 2010, at 9:25 AM, Sylvain Jeaugey wrote: > > > > > >>On Thu, 7 Jan 2010, Eugene Loh wrote: > >> > >>>setenv MALLOC_MMAP_MAX_ 0 > >>>setenv MALLOC_TRIM_THRESHOLD_ -1 > >>> > >>> > >>But yes, this set of settings is the number one tweak on HPC code that I'm > >>aware of. > >> > >> > Wow! I might vote for "compiling with -O", but let's not pick nits here. > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Jeff Squyres jsquy...@cisco.com