You're right, only supported on anonymous memory mapping in Linux currently, page cache layer is a possible future use. Large mmap will wipe PTE cache there until that future.
Regards, John Haller > -----Original Message----- > From: ольга крыжановская [mailto:[email protected]] > Sent: Friday, September 13, 2013 5:46 PM > John, please correct me, but AFAIK does not support large pages/huge pages, > for > mmap() on files, right? AFAIK Solaris 11 was the first Unix which explicitly > supports large pages/huge pages for mmap() on files. > > Olga > > On Sat, Sep 14, 2013 at 12:43 AM, Haller, John H (John) <john.haller@alcatel- > lucent.com> wrote: > > With any luck, the systems with large allocations will be using transparent > > huge > pages for systems which support it, and up to 2M is just a single page table > entry. Unfortunately, that requires that the 2M (or the size of the mmap if > lower) be contiguous, and it's easy to run out of contiguous 2M chunks of > memory or pre-allocated contiguous regions. That brings it down to 512 page > table entries to be potentially copied on fork. Whether the PTEs are copied > depends on whether they are just in VM and unmapped, and if PTEs in VM which > are unmapped need to be tracked, which is probably very OS dependent. But, to > have a low cost fork, the PTEs in general can't be copied for the usual case > of > being followed by exec. If the underlying mapped memory is accessed, the PTE > lookup would fault, and the PTE would need to be copied then. Ideally, the > only > PTE to be accessed is the one for the instruction for exec, and PTEs for it's > data, > and the other PTEs in the same page(s). This probably forces a copy of the PTE > so the OS can keep track of how many PTEs refer to the same memory location. > > > > On Linux, you can find the number of preallocated hugepages with > /proc/sys/vm/nr_hugepages. Transparent hugepages may allocate hugepages if > contiguous memory can be found. Without huge pages, just allocating the 131k > of PTE for the mmap is likely to add some overhead to the grep call, along > with > the limited number of PTE cache entries in the processor. With hugepages, as a > limited resource, I'm not sure how many one would want to allocate for one > process. PTE cache misses are as expensive as memory cache misses. Because of > transparent hugepages, one might get better performance on a freshly booted > machine with lots of free memory, than performance after all the memory has > been allocated at least once. On other OSs, your mileage may vary, but the > number of PTE cache entries will remain constant. > > > > FWIW, when Intel was developing their Data Plane Development Kit, to get > decent performance, they needed to allocate the packet buffers in allocated > huge pages, as the PTE cache miss was ruining performance. The driver can now > directly DMA the packet into cache from a NIC, so no cache misses there. At > 10Gbps and 64 byte packets, 2 cache misses take longer than the time from the > end of one packet to the end of the next packet. > > > > Regards, > > John Haller > > > > > >> -----Original Message----- > >> From: [email protected] [mailto:ast-users- > >> [email protected]] On Behalf Of Glenn Fowler > >> Sent: Friday, September 13, 2013 5:00 PM > >> Subject: Re: [ast-users] Thank you for the grep builtin! > >> > >> > >> we're getting close > >> again we're not interested in the pages but the metadata for the > >> pages > >> > >> this may be based on incorrect assumptions ... > >> 1Gib mapped and 8Kib page size => 131072 entries for address-to-page > >> lookup at > >> fork() time the parent process has that 131072 entry table in hand > >> what does the child get? a copy of that 131072 entry table or a reference? > >> > >> On Fri, 13 Sep 2013 23:26:34 +0200 > >> =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?= > >> wrote: > >> > No, this is not copy on write, this is > >> > check-what-to-do-on-access-when-not-mapped. The short explanation > >> > is, that the fork() is not the time when an action in the VM system > >> > will happen, its the time of the first access to a page, which is > >> > not mapped yet, in the current process, when an action will happen. > >> > What is copied at fork() time, is the range information, i.e. > >> > mapping from/to/flags, but not the individual pages. So the number > >> > of mapped areas is a concern at fork() time, but not their size. > >> > >> > Olga > >> > >> > On Fri, Sep 13, 2013 at 11:20 PM, Glenn Fowler <[email protected]> > wrote: > >> > > > >> > > On Fri, 13 Sep 2013 23:14:22 +0200 > >> > > =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?= > >> wrote: > >> > >> Glenn, shared mmap() mapping do not have any impact on fork() > >> > >> performance, at least on VM architectures who can share pages > >> > >> (this is common practice since at least SystemV, and no modern > >> > >> Unix or Linux exists which does not do copy-on-write, but more > >> > >> on that > >> > >> below) The pages are not even touched, or looked at at fork() > >> > >> time, so even millions of mmap() pages have no impact. > >> > >> Only if the pages are touched the VM system will realize a > >> > >> fork() has happened, and *may* create a copy-on-write copy if > >> > >> you write to it. If you only read the pages nothing will happen. > >> > > > >> > > thanks > >> > > > >> > > we weren't concerned about the pages themselves but the TLB or > >> > > whatever the vm system uses to keep track of pages that has to be > >> > > duped on fork(), no? > >> > > or are you saying even that is copy on write? > >> > > > >> > >> > -- > >> > , _ _ , > >> > { \/`o;====- Olga Kryzhanovska -====;o`\/ } > >> > .----'-/`-/ [email protected] \-`\-'----. > >> > `'-..-| / http://twitter.com/fleyta \ |-..-'` > >> > /\/\ Solaris/BSD//C/C++ programmer /\/\ > >> > `--` `--` > >> > >> _______________________________________________ > >> ast-users mailing list > >> [email protected] > >> http://lists.research.att.com/mailman/listinfo/ast-users > > > > -- > , _ _ , > { \/`o;====- Olga Kryzhanovska -====;o`\/ } > .----'-/`-/ [email protected] \-`\-'----. > `'-..-| / http://twitter.com/fleyta \ |-..-'` > /\/\ Solaris/BSD//C/C++ programmer /\/\ > `--` `--` _______________________________________________ ast-users mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-users
