You're right, only supported on anonymous memory mapping in Linux currently,
page cache layer is a possible future use. Large mmap will wipe PTE cache there 
until that future.

Regards,
John Haller


> -----Original Message-----
> From: ольга крыжановская [mailto:[email protected]]
> Sent: Friday, September 13, 2013 5:46 PM
> John, please correct me, but AFAIK does not support large pages/huge pages, 
> for
> mmap() on files, right? AFAIK Solaris 11 was the first Unix which explicitly
> supports large pages/huge pages for mmap() on files.
> 
> Olga
> 
> On Sat, Sep 14, 2013 at 12:43 AM, Haller, John H (John) <john.haller@alcatel-
> lucent.com> wrote:
> > With any luck, the systems with large allocations will be using transparent 
> > huge
> pages for systems which support it, and up to 2M is just a single page table
> entry. Unfortunately, that requires that the 2M (or the size of the mmap if
> lower) be contiguous, and it's easy to run out of contiguous 2M chunks of
> memory or pre-allocated contiguous regions. That brings it down to 512 page
> table entries to be potentially copied on fork. Whether the PTEs are copied
> depends on whether they are just in VM and unmapped, and if PTEs in VM which
> are unmapped need to be tracked, which is probably very OS dependent. But, to
> have a low cost fork, the PTEs in general can't be copied for the usual case 
> of
> being followed by exec. If the underlying mapped memory is accessed, the PTE
> lookup would fault, and the PTE would need to be copied then. Ideally, the 
> only
> PTE to be accessed is the one for the instruction for exec, and PTEs for it's 
> data,
> and the other PTEs in the same page(s). This probably forces a copy of the PTE
> so the OS can keep track of how many PTEs refer to the same memory location.
> >
> > On Linux, you can find the number of preallocated hugepages with
> /proc/sys/vm/nr_hugepages. Transparent hugepages may allocate hugepages if
> contiguous memory can be found. Without huge pages, just allocating the 131k
> of PTE for the mmap is likely to add some overhead to the grep call, along 
> with
> the limited number of PTE cache entries in the processor. With hugepages, as a
> limited resource, I'm not sure how many one would want to allocate for one
> process. PTE cache misses are as expensive as memory cache misses. Because of
> transparent hugepages, one might get better performance on a freshly booted
> machine with lots of free memory, than performance after all the memory has
> been allocated at least once. On other OSs, your mileage may vary, but the
> number of PTE cache entries will remain constant.
> >
> > FWIW, when Intel was developing their Data Plane Development Kit, to get
> decent performance, they needed to allocate the packet buffers in allocated
> huge pages, as the PTE cache miss was ruining performance. The driver can now
> directly DMA the packet into cache from a NIC, so no cache misses there. At
> 10Gbps and 64 byte packets, 2 cache misses take longer than the time from the
> end of one packet to the end of the next packet.
> >
> > Regards,
> > John Haller
> >
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:ast-users-
> >> [email protected]] On Behalf Of Glenn Fowler
> >> Sent: Friday, September 13, 2013 5:00 PM
> >> Subject: Re: [ast-users] Thank you for the grep builtin!
> >>
> >>
> >> we're getting close
> >> again we're not interested in the pages but the metadata for the
> >> pages
> >>
> >> this may be based on incorrect assumptions ...
> >> 1Gib mapped and 8Kib page size => 131072 entries for address-to-page
> >> lookup at
> >> fork() time the parent process has that 131072 entry table in hand
> >> what does the child get? a copy of that 131072 entry table or a reference?
> >>
> >> On Fri, 13 Sep 2013 23:26:34 +0200
> >> =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?=
> >> wrote:
> >> > No, this is not copy on write, this is
> >> > check-what-to-do-on-access-when-not-mapped. The short explanation
> >> > is, that the fork() is not the time when an action in the VM system
> >> > will happen, its the time of the first access to a page, which is
> >> > not mapped yet, in the current process, when an action will happen.
> >> > What is copied at fork() time, is the range information, i.e.
> >> > mapping from/to/flags, but not the individual pages. So the number
> >> > of mapped areas is a concern at fork() time, but not their size.
> >>
> >> > Olga
> >>
> >> > On Fri, Sep 13, 2013 at 11:20 PM, Glenn Fowler <[email protected]>
> wrote:
> >> > >
> >> > > On Fri, 13 Sep 2013 23:14:22 +0200
> >> > > =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?=
> >> wrote:
> >> > >> Glenn, shared mmap() mapping do not have any impact on fork()
> >> > >> performance, at least on VM architectures who can share pages
> >> > >> (this is common practice since at least SystemV, and no modern
> >> > >> Unix or Linux exists which does not do copy-on-write, but more
> >> > >> on that
> >> > >> below) The pages are not even touched, or looked at at fork()
> >> > >> time, so even millions of mmap() pages have no impact.
> >> > >> Only if the pages are touched the VM system will realize a
> >> > >> fork() has happened, and *may* create a copy-on-write copy if
> >> > >> you write to it. If you only read the pages nothing will happen.
> >> > >
> >> > > thanks
> >> > >
> >> > > we weren't concerned about the pages themselves but the TLB or
> >> > > whatever the vm system uses to keep track of pages that has to be
> >> > > duped on fork(), no?
> >> > > or are you saying even that is copy on write?
> >> > >
> >>
> >> > --
> >> >       ,   _                                    _   ,
> >> >      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
> >> > .----'-/`-/     [email protected]   \-`\-'----.
> >> >  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
> >> >       /\/\     Solaris/BSD//C/C++ programmer   /\/\
> >> >       `--`                                      `--`
> >>
> >> _______________________________________________
> >> ast-users mailing list
> >> [email protected]
> >> http://lists.research.att.com/mailman/listinfo/ast-users
> 
> 
> 
> --
>       ,   _                                    _   ,
>      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
> .----'-/`-/     [email protected]   \-`\-'----.
>  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
>       /\/\     Solaris/BSD//C/C++ programmer   /\/\
>       `--`                                      `--`
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to