John, please correct me, but AFAIK does not support large pages/huge
pages, for mmap() on files, right? AFAIK Solaris 11 was the first Unix
which explicitly supports large pages/huge pages for mmap() on files.

Olga

On Sat, Sep 14, 2013 at 12:43 AM, Haller, John H (John)
<[email protected]> wrote:
> With any luck, the systems with large allocations will be using transparent 
> huge pages for systems which support it, and up to 2M is just a single page 
> table entry. Unfortunately, that requires that the 2M (or the size of the 
> mmap if lower) be contiguous, and it's easy to run out of contiguous 2M 
> chunks of memory or pre-allocated contiguous regions. That brings it down to 
> 512 page table entries to be potentially copied on fork. Whether the PTEs are 
> copied depends on whether they are just in VM and unmapped, and if PTEs in VM 
> which are unmapped need to be tracked, which is probably very OS dependent. 
> But, to have a low cost fork, the PTEs in general can't be copied for the 
> usual case of being followed by exec. If the underlying mapped memory is 
> accessed, the PTE lookup would fault, and the PTE would need to be copied 
> then. Ideally, the only PTE to be accessed is the one for the instruction for 
> exec, and PTEs for it's data, and the other PTEs in the same page(s). This 
> probabl
 y forces a copy of the PTE so the OS can keep track of how many PTEs refer to 
the same memory location.
>
> On Linux, you can find the number of preallocated hugepages with 
> /proc/sys/vm/nr_hugepages. Transparent hugepages may allocate hugepages if 
> contiguous memory can be found. Without huge pages, just allocating the 131k 
> of PTE for the mmap is likely to add some overhead to the grep call, along 
> with the limited number of PTE cache entries in the processor. With 
> hugepages, as a limited resource, I'm not sure how many one would want to 
> allocate for one process. PTE cache misses are as expensive as memory cache 
> misses. Because of transparent hugepages, one might get better performance on 
> a freshly booted machine with lots of free memory, than performance after all 
> the memory has been allocated at least once. On other OSs, your mileage may 
> vary, but the number of PTE cache entries will remain constant.
>
> FWIW, when Intel was developing their Data Plane Development Kit, to get 
> decent performance, they needed to allocate the packet buffers in allocated 
> huge pages, as the PTE cache miss was ruining performance. The driver can now 
> directly DMA the packet into cache from a NIC, so no cache misses there. At 
> 10Gbps and 64 byte packets, 2 cache misses take longer than the time from the 
> end of one packet to the end of the next packet.
>
> Regards,
> John Haller
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:ast-users-
>> [email protected]] On Behalf Of Glenn Fowler
>> Sent: Friday, September 13, 2013 5:00 PM
>> Subject: Re: [ast-users] Thank you for the grep builtin!
>>
>>
>> we're getting close
>> again we're not interested in the pages
>> but the metadata for the pages
>>
>> this may be based on incorrect assumptions ...
>> 1Gib mapped and 8Kib page size => 131072 entries for address-to-page lookup 
>> at
>> fork() time the parent process has that 131072 entry table in hand what does 
>> the
>> child get? a copy of that 131072 entry table or a reference?
>>
>> On Fri, 13 Sep 2013 23:26:34 +0200 =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?=
>> wrote:
>> > No, this is not copy on write, this is
>> > check-what-to-do-on-access-when-not-mapped. The short explanation is,
>> > that the fork() is not the time when an action in the VM system will
>> > happen, its the time of the first access to a page, which is not
>> > mapped yet, in the current process, when an action will happen. What
>> > is copied at fork() time, is the range information, i.e. mapping
>> > from/to/flags, but not the individual pages. So the number of mapped
>> > areas is a concern at fork() time, but not their size.
>>
>> > Olga
>>
>> > On Fri, Sep 13, 2013 at 11:20 PM, Glenn Fowler <[email protected]> 
>> > wrote:
>> > >
>> > > On Fri, 13 Sep 2013 23:14:22 +0200 =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?=
>> wrote:
>> > >> Glenn, shared mmap() mapping do not have any impact on fork()
>> > >> performance, at least on VM architectures who can share pages (this
>> > >> is common practice since at least SystemV, and no modern Unix or
>> > >> Linux exists which does not do copy-on-write, but more on that
>> > >> below) The pages are not even touched, or looked at at fork() time,
>> > >> so even millions of mmap() pages have no impact.
>> > >> Only if the pages are touched the VM system will realize a fork()
>> > >> has happened, and *may* create a copy-on-write copy if you write to
>> > >> it. If you only read the pages nothing will happen.
>> > >
>> > > thanks
>> > >
>> > > we weren't concerned about the pages themselves but the TLB or
>> > > whatever the vm system uses to keep track of pages that has to be
>> > > duped on fork(), no?
>> > > or are you saying even that is copy on write?
>> > >
>>
>> > --
>> >       ,   _                                    _   ,
>> >      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
>> > .----'-/`-/     [email protected]   \-`\-'----.
>> >  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
>> >       /\/\     Solaris/BSD//C/C++ programmer   /\/\
>> >       `--`                                      `--`
>>
>> _______________________________________________
>> ast-users mailing list
>> [email protected]
>> http://lists.research.att.com/mailman/listinfo/ast-users



-- 
      ,   _                                    _   ,
     { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
.----'-/`-/     [email protected]   \-`\-'----.
 `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
      /\/\     Solaris/BSD//C/C++ programmer   /\/\
      `--`                                      `--`
_______________________________________________
ast-users mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-users

Reply via email to