Hi Gil, Very interesting… I’ve downloaded the Meltdown and Spectre papers but have just started reading them. My guess was that My first reaction was also wow followed by, of course…. We make use of the PCID just never through to use it this way.I’m still curious to understand how speculative execution can be used as an attack vector. If feels like the solution is to force more cache flushing between context switches.
— Kirk > On Jan 7, 2018, at 8:54 PM, Gil Tene <[email protected]> wrote: > > I'm sure people here have heard enough about the Meltdown vulnerability and > the rush of Linux fixes that have to do with addressing it. So I won't get > into how the vulnerability works here (my one word reaction to the simple > code snippets showing "remote sensing" of protected data values was "Wow"). > > However, in examining both the various fixes rolled out in actual Linux > distros over the past few days and doing some very informal surveying of > environments I have access to, I discovered that the PCID processor feature, > which used to be a virtual no-op, is now a performance AND security critical > item. In the spirit of [mechanically] sympathizing with the many systems that > now use PCID for a new purpose, as well was with the gap between the > haves/have-nots in the PCID world, let me explain why: > > The PCID (Processor-Context ID) feature on x86-64 works much like the more > generic ASID (Address Space IDs) available on many hardware platforms for > decades. Simplistically, it allows TLB-cached page table contents to be > tagged with a context identifier, and limits the lookups in the TLB to only > match within the currently allowed context. TLB cached entires with a > different PCID will be ignored. Without this feature, a context switch that > would involve switching to a different page table (e.g. a process-to-process > context switch) would require a flush of the entire TLB. With the feature, it > only requires a change to the context id designated as "currently allowed". > The benefit of this comes up when a back-and-forth set of context switches > (e.g. from process 1 to process 2 and back to process 1) occurs "quickly > enough" that TLB entries of the newly-switched-into context still reside in > the TLB cache. With modern x86 CPUs holding >1K entries in their L2 TLB > caches (sometimes referred to as STLB), and each entry mapping 2MB or 4KB > virtual regions to physical pages, the possibility of such reuse becomes > interesting on heavily loaded systems that do a lot of process-to-process > context switching. It's important to note that in virtually all modern > operating systems, thread-to-thread context switches do not require TLB > flushing, and remain within the same PCID because they do not require > switching the page table. In addition, UNTIL NOW, most modern operating > systems implemented user-to-kernel and kernel-to-user context switching > without switching page tables, so no TLB flushing or switching or ASID/PCID > was required in system calls or interrupts. > > The PCID feature has been a "cool, interesting, but not critical" feature to > know about in most Linux/x86 environments for these main reasons: > > 1. Linux kernels did not make use of PCID until 4.14. So even tho it's been > around and available in hardware, it didn't make any difference. > > 2. It's bee around and supported in hardware "forever", since 2010 > (apparently added with Westmere), so it's not new or exciting. > > 3. The benefits of PCID-based retention of TLB entires in the TLB cache, once > supported by the OS, would only show up when process-to-process context > switching is rapid enough to matter. While heavily loaded systems with lots > of active processes (not threads) that rapidly switch would benefit, systems > with a reasonable number of of [potentially heavily] multi-threaded > processes wouldn't really be affected or see a benefit. > > This all changed with Meltdown. > > The basic mechanism used by Meltdown fixes in the various distros, under term > variants like "pti", "KPTI", "kaiser" and "KAISER", all have one key thing in > common: They use completely separate page tables for user mode execution and > for kernel mode execution, in order to make sure that kernel mapping would > not be available [to the processor] as the basis for any speculative > operations. Where previously a user process had a single page table with > entries for both user-space and kernel-space mappings in it (with the kernel > mapping having access enforced by protection rules), it now has two page > tables: A "user-only" table containing only the user-accesible mappings (this > table is referred to as "user" in some variants and "shadow" in other > variants), and another table containing both the kernel and the user mappings > (referred to as "kernel" in the variants I've seen so far). When running > user-mode code, the user-only table is the currently active table that the > processor would walk on a TLB miss, and when running kernel code, the > "kernel" table is. System calls switch from using the user-only table to > using the kernel table, perform their kernel-code work, and then switch back > to the user-only table before returning to user code. > > When a processor has the PCID feature, this back-and-forth switching between > page tables is achieved by using separate PCIDs for the two tables associated > with the process. For kernels that did not previously have PCID support > (which is all kernels prior to 4.14, so the vast majority of kernels in use > at the time of this writing), the Meltdown fix variants seem to use constant > PCID values for this purpose (e.g. 0 for kernel and 128 for user). For later > kernels where PCID-to-process relationship is maintained on each CPU, the > PCID space is split in half (e.g. uPCID = kPCID + 2048). Either way, the > switch back and forth between the user-only table and the kernel table does > involve telling the CPU that the page table root and the PCID have changed, > but does not require or force a TLB flush. > > When a processor does NOT have the PCID feature, things get ugly. Each system > call and each user-to-kernel-to-user transition (like an interrupt) would be > required to flush the TLB twice (once after each switch), which means two > terrible things happen: > > 1. System calls [which are generally fairly short] are pretty much guaranteed > to incur TLB misses on all first-access any data and code within the call, > with each miss taking 1-7 steps to walk through the page tables in memory. > This has an obvious impact on workloads that involve frequent system calls, > as the length of each system call will now be longer. > > 2. Each system call and each user-to-kernel-to-user transition flushes the > entire cache of user space TLBs, which means that *after* the > systemcall/transition 100s or 1000s of additional TLB misses will be > incurred, the walks for many of which can end up missing in L2/L3. This will > effect applications and systems that do not necessarily have a "very high" > rate of system calls. The more TLBs have being helping your performance, the > more this impact will be felt, and TLBs have been silently helping you for > decades. It is enough for only a few hundreds or a few thousands of > user-to-kernel-to-user transitions per second to be happening for this impact > to be sorely felt. And guess what: in most normal configurations, interrupts > (timer, TLB-invalidate, etc.) all cause such transitions on a regular and > frequent basis. > > The performance impact of needing to fully flush the TLB on each transition > is apparently high enough that at least some of the Meltdown-fixing variants > I've read through (e.g. the KAISER variant in RHEL7/RHEL6 and their CentOS > brethren) are not willing to take it. Instead, some of those variants appear > to implicitly turn off the dual-page-table-per-process security measure if > the processor they are running on does not have PCID capability. > > The bottom line so far is: you REALLY want PCID in your processor. Without > it, you may be running insecurely (Meltdown fixes turned off by default), or > you may run so slow you'll be wishing for a security intrusion to put you out > of your misery. > > Ok. So far, you'd think this whole thing boils down to "once I update my > Linux distro with the latest fixes, I just want to make sure I'm not running > on ancient hardware". And since virtually all x86 hardware made this decade > has PCID support, everything is fine. Right? That was my first thought too. > Then I went and check a bunch of systems. Most of the Linux instances I > looked in had no pcid feature, and all of them were running on modern > hardware. Oh Shit. > > The quickest way to check whether or not you have PCID is to grep for "pcid" > in /proc/cpuinfo. If it's there, you're good. You can stop reading and go on > to worrying about the other performance and security impacts being discussed > everywhere else. But if it's not there, you are in trouble. You now have a > choice between running insecurely (turn pti off) and having performance so > bad that some of the security fixes out there will refuse to secure you. Or > you can act (which often means "go scream at someon") and get that PCID > feature you now really really need. > > So, why would youI not have PCID? > > It turns out that because PCID was so boring and non-exciting, and Linux > didn't even use it until a couple of months ago, it's been withheld from many > guest-OS instances when running on modern hardware and modern hypervisors. In > my quick and informal polling I so far found that: > > - Most of the KVM guests I personally ooked in did NOT have pcid > - All the VMWare guests I personally looked in had pcid > - About half the AWS instances I l personally looked in did NOT have pcid, > and the other half did. > > [I encourage others to add their experiences, and e.g. enrich this with a > table of PCID-capability on known instance types on cloud platforms] > > The actual Bottom Line: > > - On any system that does not currently show "pcid" in the flags line of > /proc/cpuinfo, Meltdown is a bigger issue than "install latest updates". > > - PCID is now a critical feature for both security and performance. > > - Many existing Linux guest instances don't have PCID. Including many Cloud > instances. > > Go get your PCID! > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "mechanical-sympathy" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
