On 12/09/2016 02:37 AM, Kirill A. Shutemov wrote: > On other hand, large virtual address space would put more pressure on > cache -- at least one more page table per process, if we make 56-bit VA > default.
For a process only using a small amount of its address space, the mid-level paging structure caches will be very effective since the page walks are all very similar. You may take a cache miss on the extra level on the *first* walk, but you only do that once per context switch. I bet the CPU is also pretty aggressive about filling those things when it sees a new CR3 and they've been forcibly emptied. So, you may never even _see_ the latency from that extra miss.

