On 13/02/2025 11:24 pm, Jennifer Miller wrote: > On Thu, Feb 13, 2025 at 09:24:18PM +0000, Andrew Cooper wrote: >>>> ; swap stacks as normal >>>> mov QWORD PTR gs:[rip+0x7f005f85],rsp # 0x6014 <cpu_tss_rw+20> >>>> mov rsp,QWORD PTR gs:[rip+0x7f02c56d] # 0x2c618 <pcpu_hot+24> >> ... these are memory accesses using the user %gs. As you note a few >> lines lower, %gs isn't safe at this point. >> >> A cunning attacker can make gs:[rip+0x7f02c56d] be a read-only mapping, >> at point we'll have loaded an attacker controlled %rsp, then take #PF >> trying to spill %rsp into pcpu_hot, and now we're running the pagefault >> handler on an attacker controlled stack and gsbase. >> > I don't follow, the spill of %rsp into pcpu_hot occurs first, before we > would move to the attacker controlled stack. This is Intel asm syntax, > sorry if that was unclear.
No, sorry. It's clearly written; I simply wasn't paying enough attention. > Still, I hadn't considered misusing readonly/unmapped pages on the GPR > register spill that follows. Could we enforce that the stack pointer we get > be page aligned to prevent this vector? So that if one were to attempt to > point the stack to readonly or unmapped memory they should be guaranteed to > double fault? Hmm. Espfix64 does involve #DF recovering from a write to a read-only stack. (This broken corner of x86 is also fixed in FRED. We fixed a *lot* of thing.) As long the #DF handler can be updated to safely distinguish espfix64 from this entrypoint attack, this seems like it might mitigate the read-only case. > I think we can do the overwrite at any point before actually calling into > the individual syscall handlers, really anywhere before potentially > hijacked indirect control flow can occur and then restore it just after > those return e.g., for the 64-bit path I am currently overwriting it at the > start of do_syscall_64 and then restoring it just before > syscall_exit_to_user_mode. I'm not sure if there is any reason to do it > sooner while we'd still be register constrained. I don't follow. If any "bad" execution is found in an entrypoint, Linux needs to panic(). Detecting the malice involves clobbering an in-use stack, and there's no ability to safely recover. ~Andrew
