On Mon, Apr 15, 2013 at 8:37 PM, Phong Vo <[email protected]> wrote: >> From [email protected] Mon Apr 15 10:53:27 2013 >> Subject: Re: [ast-developers] mmap() for command substitutions still not >> living up to its fullest potential? >> To: Phong Vo <[email protected]> >> Cc: [email protected], [email protected] > >> On Mon, Apr 15, 2013 at 4:40 PM, Phong Vo <[email protected]> wrote: >> > >> > The default size of the mapped memory for a stream is driven by a #define >> > in a header file so it isn't hard to change it. I believe an application >> > can >> > also use sfsetbuf(f, NULL, size) to set a desired size for the mapped >> > buffer. >> > >> > Generally speaking, Sfio and our other core libraries like CDT and Vmalloc >> > have been around for a very long time and their default parameters tend to >> > stay as they are until someone notices a performance issue. Well, not so >> > much >> > for Vmalloc anymore because it has been completely rewritten recently to >> > deal >> > with concurrency, both multiple threads and multiple processes for shared >> > memory. >> > >> > Anyway, it is good that you bring up this issue with Sfio now. >> > What do people think is a reasonable size to set the default mapped >> > size to on a 64-bit machine? Keep in mind that there apps with many dozens >> > files opened at the same time along with other large requirements for >> > memory. > >> I think Roland's patch from >> http://lists.research.att.com/pipermail/ast-developers/2013q2/002431.html >> is sufficient for now because long before we exhaust VA space we run >> out of patience to wait for the jobs to complete :) If that's not >> sufficient then I'll suggest the '44bit clamp' Roland proposed to >> partition 64bit VA space into 65536 files with 44bit chunks mapped >> simultaneously. That just leaves 16 times the same amount of memory >> for anon memory pages (which is many times the amount of world-wide >> installed main memory in 2012). > > Exhausting VA space is not likely but keeping processes behaving nicely toward > one another should be a good thing.
Erm... do you mean "Unix processes" in this case ? Note that the size of the MMU entries doesn't matter in today's MMU designs... basically each Unix process has it's own "MMU context" and switching between them is fast (regardless of size). > You need to think about cases when many > processes do large I/O at the same time and the physical memory available on > the machine is far less than what the VA space can accommodate. Uhm... this is usually handled gracefully... however there are corner cases when the machines themselves do not have enough memory for kernel tasks left and/or filesystem pages compete directly with application/code pages (for example see http://sysunconfig.net/unixtips/priority_paging.txt ... old Solaris 7 once invented "priority paging" to deal with that (later Solaris releases solved the problem differently)) ... but at that point the system is in trouble anyway and there is no easy way to fix that. And AFAIK the "sliding window" approach currently used by sfio doesn't prevent that... it only creates synchronisation points (e.g. the |mmap()| and |munmap()| calls) at which the process will wait until resources have been reclaimed by the kernel and made available again... but the overall costs are much higher in terms of "waiting time". This doesn't sound problematic on machine with 4 CPUs... but on machines like the SPARC-T4 with 256 CPUs this can quickly ramp up to devastatingly 15-20 seconds (!!) of extra overhead just to get the window moved to the next position on a loaded system (compared to just map the whole file in one large chunk). > It's little known but Sfio does adaptive buffer filling to reduce read I/O, > esp. when many seeks are done (hence most read data are wasted). The same > strategy could be adapted to mapped I/O. We'll look into that. Erm... what does that exactly mean ? Note that today's kernels assume that file I/O via |mmap()| is done by mapping the whole file or large chunks (e.g. 8GB etc.) and then MMU entries are filled-in on the fly when an access is made. Something like a "sliding window" which creates lots of |mmap()| and |munmap()| calls is *extremely* expensive and doesn't scale on machines with many CPUs. Or short: |mmap()| and |munmap()| are expensive calls but accessing the mapped files is a lot cheaper than |read()| or |write()|. That's why the current sfio behaviour of "... mapping a tiny window of 512k, doing I/O and then map the next window..." is very very bad in terms of scalability and system resource usage. If possible files should be mapped with the largest "chunk size" as possible or you'll run into conflict with what the kernel (or better: Solaris and Linux) expects and is designed for. ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [email protected] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
