On Mon, Apr 15, 2013 at 8:37 PM, Phong Vo <[email protected]> wrote:
>> From [email protected] Mon Apr 15 10:53:27 2013
>> Subject: Re: [ast-developers] mmap() for command substitutions still not  
>> living up to its fullest potential?
>> To: Phong Vo <[email protected]>
>> Cc: [email protected], [email protected]
>
>> On Mon, Apr 15, 2013 at 4:40 PM, Phong Vo <[email protected]> wrote:
>> >
>> > The default size of the mapped memory for a stream is driven by a #define
>> > in a header file so it isn't hard to change it. I believe an application 
>> > can
>> > also use sfsetbuf(f, NULL, size) to set a desired size for the mapped 
>> > buffer.
>> >
>> > Generally speaking, Sfio and our other core libraries like CDT and Vmalloc
>> > have been around for a very long time and their default parameters tend to
>> > stay as they are until someone notices a performance issue. Well, not so 
>> > much
>> > for Vmalloc anymore because it has been completely rewritten recently to 
>> > deal
>> > with concurrency, both multiple threads and multiple processes for shared 
>> > memory.
>> >
>> > Anyway, it is good that you bring up this issue with Sfio now.
>> > What do people think is a reasonable size to set the default mapped
>> > size to on a 64-bit machine? Keep in mind that there apps with many dozens
>> > files opened at the same time along with other large requirements for 
>> > memory.
>
>> I think Roland's patch from
>> http://lists.research.att.com/pipermail/ast-developers/2013q2/002431.html
>> is sufficient for now because long before we exhaust VA space we run
>> out of patience to wait for the jobs to complete :) If that's not
>> sufficient then I'll suggest the '44bit clamp' Roland proposed to
>> partition 64bit VA space into 65536 files with 44bit chunks mapped
>> simultaneously. That just leaves 16 times the same amount of memory
>> for anon memory pages (which is many  times the amount of world-wide
>> installed main memory in 2012).
>
> Exhausting VA space is not likely but keeping processes behaving nicely toward
> one another should be a good thing.

Erm... do you mean "Unix processes" in this case ? Note that the size
of the MMU entries doesn't matter in today's MMU designs... basically
each Unix process has it's own "MMU context" and switching between
them is fast (regardless of size).

> You need to think about cases when many
> processes do large I/O at the same time and the physical memory available on
> the machine is far less than what the VA space can accommodate.

Uhm... this is usually handled gracefully... however there are corner
cases when the machines themselves do not have enough memory for
kernel tasks left and/or filesystem pages compete directly with
application/code pages (for example see
http://sysunconfig.net/unixtips/priority_paging.txt ... old Solaris 7
once invented "priority paging" to deal with that (later Solaris
releases solved the problem differently)) ... but at that point the
system is in trouble anyway and there is no easy way to fix that.
And AFAIK the "sliding window" approach currently used by sfio doesn't
prevent that... it only creates synchronisation points (e.g. the
|mmap()| and |munmap()| calls) at which the process will wait until
resources have been reclaimed by the kernel and made available
again... but the overall costs are much higher in terms of "waiting
time". This doesn't sound problematic on machine with 4 CPUs... but on
machines like the SPARC-T4 with 256 CPUs this can quickly ramp up to
devastatingly 15-20 seconds (!!) of extra overhead just to get the
window moved to the next position on a loaded system (compared to just
map the whole file in one large chunk).

> It's little known but Sfio does adaptive buffer filling to reduce read I/O,
> esp. when many seeks are done (hence most read data are wasted). The same
> strategy could be adapted to mapped I/O. We'll look into that.

Erm... what does that exactly mean ? Note that today's kernels assume
that file I/O via |mmap()| is done by mapping the whole file or large
chunks (e.g. 8GB etc.) and then MMU entries are filled-in on the fly
when an access is made. Something like a "sliding window" which
creates lots of |mmap()| and |munmap()| calls is *extremely* expensive
and doesn't scale on machines with many CPUs.
Or short: |mmap()| and |munmap()| are expensive calls but accessing
the mapped files is a lot cheaper than |read()| or |write()|. That's
why the current sfio behaviour of "... mapping a tiny window of 512k,
doing I/O and then map the next window..." is very very bad in terms
of scalability and system resource usage. If possible files should be
mapped with the largest "chunk size" as possible or you'll run into
conflict with what the kernel (or better: Solaris and Linux) expects
and is designed for.

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to