Arrgh -- if only the Linux kernel community had accepted ummunotify, this would 
now be a moot point (i.e., the argument would be solely with the OS/glibc, not 
the MPI!).


On Jan 9, 2010, at 10:45 PM, Barrett, Brian W wrote:

> We should absolutely not change this.  For simple applications, yes, things 
> work if large blocks are allocated on the heap.  However, ptmalloc (and most 
> allocators, really), can't rationally cope with repeated allocations and 
> deallocations of large blocks.  It would be *really bad* (as we've seen 
> before) to change the behavior of our version of ptmalloc from that which is 
> provided by Linux.  Pain and suffering is all that path has ever lead to.
> 
> Just my $0.02, of course.
> 
> Brian
> 
> ________________________________________
> From: devel-boun...@open-mpi.org [devel-boun...@open-mpi.org] On Behalf Of 
> Eugene Loh [eugene....@sun.com]
> Sent: Saturday, January 09, 2010 9:55 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] MALLOC_MMAP_MAX (and MALLOC_MMAP_THRESHOLD)
> 
> Jeff Squyres wrote:
> 
> >I'm not sure I follow -- are you saying that Open MPI is disabling the large 
> >mmap allocations, and we shouldn't?
> >
> >
> Basically the reverse.  The default (I think this means Linux, whether
> with gcc, gfortran, Sun f90, etc.) is to use mmap to malloc large
> allocations.  We don't change this, but arguably we should.
> 
> Try this:
> 
> #include <stdlib.h>
> #include <stdio.h>
> 
> int main(int argc, char **argv) {
>   size_t size, nextsize;
>   void  *ptr, *nextptr;
> 
>   size = 1;
>   ptr  = malloc(size);
>   while ( size < 1000000 ) {
>     nextsize = 1.1 * size + 1;
>     nextptr  = malloc(nextsize);
>     printf("%9ld %18lx %18lx %18lx\n", size, size, nextptr - ptr, ptr);
>     size = nextsize;
>     ptr  = nextptr ;
>   }
> 
>   return 0;
> }
> 
> Here is sample output:
> 
>    # bytes         #bytes (hex)           #bytes          ptr (hex)
>                                        to next ptr
>                                           (hex)
> 
>     58279               e3a7               e3b0             58f870
>     64107               fa6b               fa80             59dc20
>     70518              11376              11380             5ad6a0
>     77570              12f02              12f10             5bea20
>     85328              14d50              14d60             5d1930
>     93861              16ea5              16eb0             5e6690
>    103248              19350              19360             5fd540
>    113573              1bba5              1bbb0             6168a0
>    124931              1e803       2b3044655bc0             632450
>    137425              218d1              22000       2b3044c88010
>    151168              24e80              25000       2b3044caa010
>    166285              2898d              29000       2b3044ccf010
>    182914              2ca82              2d000       2b3044cf8010
>    201206              311f6             294000       2b3044d25010
>    221327              3608f              37000       2b3044fb9010
>    243460              3b704              3c000       2b3044ff0010
> 
> So, below 128K allocations, pointers are allocated at successively
> higher addresses, each one just barely far enough to make room for the
> allocation.  E.g., an allocation of 0xE3A7 will push the "high-water
> mark" up 0xE3B0 further.
> 
> Beyond 128K allocations, allocations are page aligned.  The pointers all
> end in 0x010.  That is, whole numbers of pages are allocated and the
> returned address is 16 bytes (0x10) into the first page.  The size of
> the allocations are the requested amount, plus a few bytes of padding,
> rounded up to the nearest whole page size multiple.
> 
> The motivation to change, in my case, is performance.  I don't know how
> widespread this problem is, but...
> 
> >On Jan 8, 2010, at 9:25 AM, Sylvain Jeaugey wrote:
> >
> >
> >>On Thu, 7 Jan 2010, Eugene Loh wrote:
> >>
> >>>setenv MALLOC_MMAP_MAX_        0
> >>>setenv MALLOC_TRIM_THRESHOLD_ -1
> >>>
> >>>
> >>But yes, this set of settings is the number one tweak on HPC code that I'm
> >>aware of.
> >>
> >>
> Wow!  I might vote for "compiling with -O", but let's not pick nits here.
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> 
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 


-- 
Jeff Squyres
jsquy...@cisco.com


Reply via email to