On Mon, Apr 15, 2013 at 7:13 AM, Glenn Fowler <[email protected]> wrote:
>
> On Mon, 15 Apr 2013 03:07:41 +0200 Lionel Cons wrote:
>> Based on the recent discussion about using mmap() for reading the
>> results of command substitutions I did some testing and found that on
>> Solaris (Solaris 11 and a 64bit build) ksh93 still behaves not
>> optimal. The primary problem I see is that MANY mmap() calls with a
>> very small map size (524288 bytes) are executed instead of either
>> mapping the input file in one large chunk or at least uses a chunk
>> size large enough that the system can use largepages (2M for x86,
>> 4M/32M/256M for SPARC64) if possible. Using a chunk size of 524288
>> bytes is a joke.
>
>> Is there a specific reason why the the code in sfrd.c only maps such
>> small chunks (I'd expect that a 64bit process could easily map 16GB
>> each time) from a file or is this a bug?
>
> provide some iffe code that spits out the optimal mmap() page size
> for the current os/arch/configuration and that can be rolled into sfio

Erm... the "page size" (=the size used for MMU pages) is IMHO the
wrong property because it (usually) has to be chosen by the kernel
based on { MMU type, supported page sizes, available continuous memory
(as backing store) ... and for I/O the IOMMU page size and preferred
page size for the matching I/O device }.

The issue here is that the "chunk size" which sfio uses to |mmap()|
parts of a large file is very very low and prevents in most cases the
use of large pages (at least on i386/AMD64 which only has 4096bytes
and 2M/4M pages (other platforms have more choices... for example
UltraSPARC supports page sizes like 8192, 64k, 512k, 4M, 32M, 256M, 2G
pages)).

I did some digging and found that the following patch fixes the issue
for 64bit builds:
-- snip --
--- original/src/lib/libast/sfio/sfrd.c 2012-09-24 20:11:06.000000000 +0200
+++ build_i386_64bit_debug/src/lib/libast/sfio/sfrd.c   2013-04-15
03:24:22.892159982 +0200
@@ -161,18 +161,20 @@

                        /* make sure current position is page aligned */
                        if((a = (size_t)(f->here%_Sfpage)) != 0)
                        {       f->here -= a;
                                r += a;
                        }

                        /* map minimal requirement */
+#if _ptr_bits < 64
                        if(r > (round = (1 + (n+a)/f->size)*f->size) )
                                r = round;
+#endif

                        if(f->data)
                                SFMUNMAP(f, f->data, f->endb-f->data);

                        for(;;)
                        {       f->data = (uchar*)
sysmmapf((caddr_t)0, (size_t)r,
                                                        (PROT_READ|PROT_WRITE),
                                                        MAP_PRIVATE,
-- snip --

... for 32bit builds the problem is not easily fixable because there
has to be a balance between available address space (4GB... but only
2GB are usually available for file mappings) and maximum number of
open files (e.g. the value returned by $ ulimit -n # ...if we use that
with nfiles==1024 we get a maximum chunk size of $(( (pow(2,32)/2) /
1024. ))==2097152 (which would be acceptable) but for nfiles==65536 we
get a chunk size of $(( (pow(2,32)/2) / 65536. ))  == 32768 ... which
renders the advantage of using |mmap()| useless).

Based on that I'd suggest the following solution:
1. Take the patch above to allow 64bit libast consumers to allow
"unlimited" chunk size mapping. This will work in _any_ case because
a) 64bit address space is vast and b) |sfrd()| will retry with half
the chunk size if the previous attempt to |mmap()| fails.
Using an "unlimited" chunk size allows the kernel to pick the best MMU
page size available (and reduces the syscall overhead to almost zero).

Optionally we could "clamp" the chunk size to 44bits (which allows
65536 files opened with 44bit chunks open (while still being able to
use multiple 256G MMU pages for each file mapping) and still having
lots of free virtual address space for memory and stack)

2. Optionally for 32bit processes we should add low and high "limits"
for the chunk size... it should *never* be below 4M and not be higher
than $(( (pow(2,32)/2) / nfiles )) (unless size is lower than 4M).

Does that sound reasonable ?

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to