On Fri, Jun 22, 2018 at 11:47 AM, Andy Lutomirski <[email protected]> wrote: > > > >> On Jun 22, 2018, at 11:29 AM, H. Peter Anvin <[email protected]> wrote: >> >>> On 06/22/18 07:24, Andy Lutomirski wrote: >>> >>> That RPL3 part is false. The following program does: >>> >>> #include <stdio.h> >>> >>> int main() >>> { >>> unsigned short sel; >>> asm volatile ("mov %%ss, %0" : "=rm" (sel)); >>> sel &= ~3; >>> printf("Will write 0x%hx to GS\n", sel); >>> asm volatile ("mov %0, %%gs" :: "rm" (sel & ~3)); >>> asm volatile ("mov %%gs, %0" : "=rm" (sel)); >>> printf("GS = 0x%hx\n", sel); >>> return 0; >>> } >>> >>> prints: >>> >>> Will write 0x28 to GS >>> GS = 0x28 >>> >>> The x86 architecture is *insane*. >>> >>> Other than that, this patch seems generally sensible. But my >>> objection that it's incorrect with FSGSBASE enabled for %fs and %gs >>> still applies. >>> >> >> Ugh, you're right... I misremembered. The CPL simply overrides the RPL >> rather than trapping. >> >> We still need to give legacy applications which have zero idea about the >> separate bases that apply only to 64-bit mode a way to DTRT. Requiring >> these old crufty applications to do something new is not an option. > >> >> As ugly as it is, I'm thinking the Right Thing is to simply make it a >> part of the Linux ABI that if the FS or GS selector registers point into >> the LDT then we will requalify them; if a 64-bit app does that then they >> get that behavior. This isn't something that will happen >> asynchronously, and if a 64-bit process loads an LDT value into FS or >> GS, they are considered to have opted in to that behavior. > > But the old and crusty apps don’t depend on requalification because we never > used to do it. > > I’m not convinced we ever need to refresh the base. In fact, we could start > preserving the base of LDT-referencing FS/GS across context switches even > without FSGSBASE at some minor performance cost, but I don’t really see the > point. I still think my proposed semantics are easy to implement and preserve > the ABI even if they have the sad property that the FSGSBASE behavior and the > non-FSGSBASE behavior end up different. >
There's another reasonable solution: do exactly what your patch does, minus the bugs. We would need to get the RPL != 3 case right (easy) and the case where there's a non-running thread using the selector in question. The latter is probably best handled by adding a flag to thread_struct that says "fsbase needs reloading from the descriptor table" and only applies if the selector is in the LDT or TLS area. Or we could hijack a high bit in the selector. Then we'd need to update everything that uses the fields.

