On Mon, Apr 15, 2013 at 2:15 PM, Roland Mainz <[email protected]> wrote: > On Mon, Apr 15, 2013 at 7:13 AM, Glenn Fowler <[email protected]> wrote: >> >> On Mon, 15 Apr 2013 03:07:41 +0200 Lionel Cons wrote: >>> Based on the recent discussion about using mmap() for reading the >>> results of command substitutions I did some testing and found that on >>> Solaris (Solaris 11 and a 64bit build) ksh93 still behaves not >>> optimal. The primary problem I see is that MANY mmap() calls with a >>> very small map size (524288 bytes) are executed instead of either >>> mapping the input file in one large chunk or at least uses a chunk >>> size large enough that the system can use largepages (2M for x86, >>> 4M/32M/256M for SPARC64) if possible. Using a chunk size of 524288 >>> bytes is a joke. >> >>> Is there a specific reason why the the code in sfrd.c only maps such >>> small chunks (I'd expect that a 64bit process could easily map 16GB >>> each time) from a file or is this a bug? >> >> provide some iffe code that spits out the optimal mmap() page size >> for the current os/arch/configuration and that can be rolled into sfio > > Erm... the "page size" (=the size used for MMU pages) is IMHO the > wrong property because it (usually) has to be chosen by the kernel > based on { MMU type, supported page sizes, available continuous memory > (as backing store) ... and for I/O the IOMMU page size and preferred > page size for the matching I/O device }. > > The issue here is that the "chunk size" which sfio uses to |mmap()| > parts of a large file is very very low and prevents in most cases the > use of large pages (at least on i386/AMD64 which only has 4096bytes > and 2M/4M pages (other platforms have more choices... for example > UltraSPARC supports page sizes like 8192, 64k, 512k, 4M, 32M, 256M, 2G > pages)). > > I did some digging and found that the following patch fixes the issue > for 64bit builds: > -- snip -- > --- original/src/lib/libast/sfio/sfrd.c 2012-09-24 20:11:06.000000000 +0200 > +++ build_i386_64bit_debug/src/lib/libast/sfio/sfrd.c 2013-04-15 > 03:24:22.892159982 +0200 > @@ -161,18 +161,20 @@ > > /* make sure current position is page aligned */ > if((a = (size_t)(f->here%_Sfpage)) != 0) > { f->here -= a; > r += a; > } > > /* map minimal requirement */ > +#if _ptr_bits < 64 > if(r > (round = (1 + (n+a)/f->size)*f->size) ) > r = round; > +#endif > > if(f->data) > SFMUNMAP(f, f->data, f->endb-f->data); > > for(;;) > { f->data = (uchar*) > sysmmapf((caddr_t)0, (size_t)r, > > (PROT_READ|PROT_WRITE), > MAP_PRIVATE, > -- snip -- > > ... for 32bit builds the problem is not easily fixable because there > has to be a balance between available address space (4GB... but only > 2GB are usually available for file mappings) and maximum number of > open files (e.g. the value returned by $ ulimit -n # ...if we use that > with nfiles==1024 we get a maximum chunk size of $(( (pow(2,32)/2) / > 1024. ))==2097152 (which would be acceptable) but for nfiles==65536 we > get a chunk size of $(( (pow(2,32)/2) / 65536. )) == 32768 ... which > renders the advantage of using |mmap()| useless). > > Based on that I'd suggest the following solution: > 1. Take the patch above to allow 64bit libast consumers to allow > "unlimited" chunk size mapping. This will work in _any_ case because > a) 64bit address space is vast and b) |sfrd()| will retry with half > the chunk size if the previous attempt to |mmap()| fails. > Using an "unlimited" chunk size allows the kernel to pick the best MMU > page size available (and reduces the syscall overhead to almost zero). > > Optionally we could "clamp" the chunk size to 44bits (which allows > 65536 files opened with 44bit chunks open (while still being able to > use multiple 256G MMU pages for each file mapping) and still having > lots of free virtual address space for memory and stack) > > 2. Optionally for 32bit processes we should add low and high "limits" > for the chunk size... it should *never* be below 4M and not be higher > than $(( (pow(2,32)/2) / nfiles )) (unless size is lower than 4M). > > Does that sound reasonable ?
Yes, I think the patch to exclude the rounding for 64bit platforms is reasonable. I seriously doubt that 64bit platforms require extra checks beyond what the code already does because it is unlikely (with today's machines) to run into any limits with reasonable processing time. The 32bit limits you're proposing in [2] may require some benchmarking but in the long run I doubt that 32bit platforms require such work - anyone using 65536 file descriptors will likely use a 64bit address space anyway. Irek _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
