John, please correct me, but AFAIK does not support large pages/huge pages, for mmap() on files, right? AFAIK Solaris 11 was the first Unix which explicitly supports large pages/huge pages for mmap() on files.
Olga On Sat, Sep 14, 2013 at 12:43 AM, Haller, John H (John) <[email protected]> wrote: > With any luck, the systems with large allocations will be using transparent > huge pages for systems which support it, and up to 2M is just a single page > table entry. Unfortunately, that requires that the 2M (or the size of the > mmap if lower) be contiguous, and it's easy to run out of contiguous 2M > chunks of memory or pre-allocated contiguous regions. That brings it down to > 512 page table entries to be potentially copied on fork. Whether the PTEs are > copied depends on whether they are just in VM and unmapped, and if PTEs in VM > which are unmapped need to be tracked, which is probably very OS dependent. > But, to have a low cost fork, the PTEs in general can't be copied for the > usual case of being followed by exec. If the underlying mapped memory is > accessed, the PTE lookup would fault, and the PTE would need to be copied > then. Ideally, the only PTE to be accessed is the one for the instruction for > exec, and PTEs for it's data, and the other PTEs in the same page(s). This > probabl y forces a copy of the PTE so the OS can keep track of how many PTEs refer to the same memory location. > > On Linux, you can find the number of preallocated hugepages with > /proc/sys/vm/nr_hugepages. Transparent hugepages may allocate hugepages if > contiguous memory can be found. Without huge pages, just allocating the 131k > of PTE for the mmap is likely to add some overhead to the grep call, along > with the limited number of PTE cache entries in the processor. With > hugepages, as a limited resource, I'm not sure how many one would want to > allocate for one process. PTE cache misses are as expensive as memory cache > misses. Because of transparent hugepages, one might get better performance on > a freshly booted machine with lots of free memory, than performance after all > the memory has been allocated at least once. On other OSs, your mileage may > vary, but the number of PTE cache entries will remain constant. > > FWIW, when Intel was developing their Data Plane Development Kit, to get > decent performance, they needed to allocate the packet buffers in allocated > huge pages, as the PTE cache miss was ruining performance. The driver can now > directly DMA the packet into cache from a NIC, so no cache misses there. At > 10Gbps and 64 byte packets, 2 cache misses take longer than the time from the > end of one packet to the end of the next packet. > > Regards, > John Haller > > >> -----Original Message----- >> From: [email protected] [mailto:ast-users- >> [email protected]] On Behalf Of Glenn Fowler >> Sent: Friday, September 13, 2013 5:00 PM >> Subject: Re: [ast-users] Thank you for the grep builtin! >> >> >> we're getting close >> again we're not interested in the pages >> but the metadata for the pages >> >> this may be based on incorrect assumptions ... >> 1Gib mapped and 8Kib page size => 131072 entries for address-to-page lookup >> at >> fork() time the parent process has that 131072 entry table in hand what does >> the >> child get? a copy of that 131072 entry table or a reference? >> >> On Fri, 13 Sep 2013 23:26:34 +0200 =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?= >> wrote: >> > No, this is not copy on write, this is >> > check-what-to-do-on-access-when-not-mapped. The short explanation is, >> > that the fork() is not the time when an action in the VM system will >> > happen, its the time of the first access to a page, which is not >> > mapped yet, in the current process, when an action will happen. What >> > is copied at fork() time, is the range information, i.e. mapping >> > from/to/flags, but not the individual pages. So the number of mapped >> > areas is a concern at fork() time, but not their size. >> >> > Olga >> >> > On Fri, Sep 13, 2013 at 11:20 PM, Glenn Fowler <[email protected]> >> > wrote: >> > > >> > > On Fri, 13 Sep 2013 23:14:22 +0200 =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?= >> wrote: >> > >> Glenn, shared mmap() mapping do not have any impact on fork() >> > >> performance, at least on VM architectures who can share pages (this >> > >> is common practice since at least SystemV, and no modern Unix or >> > >> Linux exists which does not do copy-on-write, but more on that >> > >> below) The pages are not even touched, or looked at at fork() time, >> > >> so even millions of mmap() pages have no impact. >> > >> Only if the pages are touched the VM system will realize a fork() >> > >> has happened, and *may* create a copy-on-write copy if you write to >> > >> it. If you only read the pages nothing will happen. >> > > >> > > thanks >> > > >> > > we weren't concerned about the pages themselves but the TLB or >> > > whatever the vm system uses to keep track of pages that has to be >> > > duped on fork(), no? >> > > or are you saying even that is copy on write? >> > > >> >> > -- >> > , _ _ , >> > { \/`o;====- Olga Kryzhanovska -====;o`\/ } >> > .----'-/`-/ [email protected] \-`\-'----. >> > `'-..-| / http://twitter.com/fleyta \ |-..-'` >> > /\/\ Solaris/BSD//C/C++ programmer /\/\ >> > `--` `--` >> >> _______________________________________________ >> ast-users mailing list >> [email protected] >> http://lists.research.att.com/mailman/listinfo/ast-users -- , _ _ , { \/`o;====- Olga Kryzhanovska -====;o`\/ } .----'-/`-/ [email protected] \-`\-'----. `'-..-| / http://twitter.com/fleyta \ |-..-'` /\/\ Solaris/BSD//C/C++ programmer /\/\ `--` `--` _______________________________________________ ast-users mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-users
