We need to worry about applications that require a large amount of memory outside of Sfio too. I know of a couple of local apps that would routinely use 40-50Gbs for shared memory managed by Vmalloc and CDT. We wouldn't want to thrash them unncessarily.
Phong > From [email protected] Mon Apr 15 08:15:15 2013 > To: Glenn Fowler <[email protected]> > Cc: [email protected] > Subject: Re: [ast-developers] mmap() for command substitutions still not > living up to its fullest potential? > On Mon, Apr 15, 2013 at 7:13 AM, Glenn Fowler <[email protected]> wrote: > > > > On Mon, 15 Apr 2013 03:07:41 +0200 Lionel Cons wrote: > >> Based on the recent discussion about using mmap() for reading the > >> results of command substitutions I did some testing and found that on > >> Solaris (Solaris 11 and a 64bit build) ksh93 still behaves not > >> optimal. The primary problem I see is that MANY mmap() calls with a > >> very small map size (524288 bytes) are executed instead of either > >> mapping the input file in one large chunk or at least uses a chunk > >> size large enough that the system can use largepages (2M for x86, > >> 4M/32M/256M for SPARC64) if possible. Using a chunk size of 524288 > >> bytes is a joke. > > > >> Is there a specific reason why the the code in sfrd.c only maps such > >> small chunks (I'd expect that a 64bit process could easily map 16GB > >> each time) from a file or is this a bug? > > > > provide some iffe code that spits out the optimal mmap() page size > > for the current os/arch/configuration and that can be rolled into sfio > Erm... the "page size" (=the size used for MMU pages) is IMHO the > wrong property because it (usually) has to be chosen by the kernel > based on { MMU type, supported page sizes, available continuous memory > (as backing store) ... and for I/O the IOMMU page size and preferred > page size for the matching I/O device }. > The issue here is that the "chunk size" which sfio uses to |mmap()| > parts of a large file is very very low and prevents in most cases the > use of large pages (at least on i386/AMD64 which only has 4096bytes > and 2M/4M pages (other platforms have more choices... for example > UltraSPARC supports page sizes like 8192, 64k, 512k, 4M, 32M, 256M, 2G > pages)). > I did some digging and found that the following patch fixes the issue > for 64bit builds: > -- snip -- > --- original/src/lib/libast/sfio/sfrd.c 2012-09-24 20:11:06.000000000 +0200 > +++ build_i386_64bit_debug/src/lib/libast/sfio/sfrd.c 2013-04-15 > 03:24:22.892159982 +0200 > @@ -161,18 +161,20 @@ > /* make sure current position is page aligned */ > if((a = (size_t)(f->here%_Sfpage)) != 0) > { f->here -= a; > r += a; > } > /* map minimal requirement */ > +#if _ptr_bits < 64 > if(r > (round = (1 + (n+a)/f->size)*f->size) ) > r = round; > +#endif > if(f->data) > SFMUNMAP(f, f->data, f->endb-f->data); > for(;;) > { f->data = (uchar*) > sysmmapf((caddr_t)0, (size_t)r, > > (PROT_READ|PROT_WRITE), > MAP_PRIVATE, > -- snip -- > ... for 32bit builds the problem is not easily fixable because there > has to be a balance between available address space (4GB... but only > 2GB are usually available for file mappings) and maximum number of > open files (e.g. the value returned by $ ulimit -n # ...if we use that > with nfiles==1024 we get a maximum chunk size of $(( (pow(2,32)/2) / > 1024. ))==2097152 (which would be acceptable) but for nfiles==65536 we > get a chunk size of $(( (pow(2,32)/2) / 65536. )) == 32768 ... which > renders the advantage of using |mmap()| useless). > Based on that I'd suggest the following solution: > 1. Take the patch above to allow 64bit libast consumers to allow > "unlimited" chunk size mapping. This will work in _any_ case because > a) 64bit address space is vast and b) |sfrd()| will retry with half > the chunk size if the previous attempt to |mmap()| fails. > Using an "unlimited" chunk size allows the kernel to pick the best MMU > page size available (and reduces the syscall overhead to almost zero). > Optionally we could "clamp" the chunk size to 44bits (which allows > 65536 files opened with 44bit chunks open (while still being able to > use multiple 256G MMU pages for each file mapping) and still having > lots of free virtual address space for memory and stack) > 2. Optionally for 32bit processes we should add low and high "limits" > for the chunk size... it should *never* be below 4M and not be higher > than $(( (pow(2,32)/2) / nfiles )) (unless size is lower than 4M). > Does that sound reasonable ? > ---- > Bye, > Roland > -- > __ . . __ > (o.\ \/ /.o) [email protected] > \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer > /O /==\ O\ TEL +49 641 3992797 > (;O/ \/ \O;) > _______________________________________________ > ast-developers mailing list > [email protected] > http://lists.research.att.com/mailman/listinfo/ast-developers _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
