On Mon, Nov 19, 2018 at 08:39:41PM +0100, Jiri Kosina wrote: > On Mon, 19 Nov 2018, Andrea Arcangeli wrote: > > > Generally speaking the untrusted code that would try to use spectrev2 > > to attack the other processes is more likely to run inside SECCOMP > > jail than outside, so if SECCOMP should be used as a best effort > > heuristic to decide when to enable STIBP, it would make more sense to > > enable STIBP outside SECCOMP, and not inside. I.e. the exact opposite > > of what you're proposing above. > > Hmm, that's a very good point. But I actually don't see why both > directions wouldn't be possible real-blackhat-world scenarios. So perhaps > we'd want, under the basic asumption that "SECCOMP should really be > sandboxed from outisde interventions and from causing them from inside as > well", flush on both switch-to-seccomp and switch-from-seccomp?
STIBP doesn't flush so I don't see how "flush" and "switch" fits the STIBP discussion. Flush as in IPBP on switch-to-seccomp and switch-from-seccomp? IBPB is not going to solve the HT attack and STIBP is only about the HT attack. IBPB only solves the user-to-user context switch attack. I just don't see SECCOMP as a good fit for a default-on heuristic because there would be more arguments to enable STIBP outside seccomp than inside and even if you ignore that, SECCOMP is used by pretty much everything including wrapping through containers and systemd so it would still leave lots of software running with STIBP (and for all the wrong reasons too). As opposed the not dumpable was a much better fit for a per-process enablement heuristic, because the not dumpable code is more likely to be the one that needs protection from attack and it's less likely to be the very malicious code that got exploited (or was untrusted to begin with like DRM binary blobs or public cloud usages). However like mentioned in this thread suid calls can set the non dumpable flag, so it's not ideal either. We'd need to track which processes turned off the not dumpable flag with SUID_DUMP_DISABLE explicitly. > So if I understand you correctly, what you are proposing here is to keep > the current code, but just switch the default, and make it > runtime/boottime togglable? Deciding the default on this stuff is nightmarish, there's no good default and the best system-wide default is data and workload dependent. And this is precisly why this should be runtime toggable and not just boot-time toggable in my view. I don't disagree with default disabled, that may be safer to avoid breaking workloads near full capacity (same reason for why HT isn't disabled by default for L1TF), we've to draw a line somewhere with the default. The ASLR argument from Tim's patchset cover letter combined with PID namespaces should go a long way to mitigate the HT attack too even without STIBP. In my understanding you need to know what's running on the sibling thread to derandomize ASLR, otherwise you'd be potentially attacking glibc or some lib that yes is always mapped by all processes but it's not mapped at the same address in all processes. You need to restrict the measurement during ASLR derandomization to the exact time there's the target process running in the sibling (any thread of the process would be good). Now assuming there's no pid namespace that prevents to see what's running on the sibling thread, it depends on the scheduling jittering and on the size and hw hashfn of the BTB (which varies across CPUs) how hard it is to derandomize ASLR. According to the original paper, some non-Linux OS has many low significant bits of their ASLR not randomized and the high bits don't go into the hashfn of the BTB (incidentally the ASLR derandomization technique to attack userland is apparently not tested for Linux). We should be randomizing all bits down to bit 12 (not bit 15), so for us the derandomizing should be 256 times more expensive? (At least until the day we map .text into filesystem THP pagecache...) The more bits randomized that are part of the BTB hashfn input, the more computational expensive it becomes to derandomize ASLR, the more the random scheduling jittering will interfere with the longer measurement, so hopefully the complexity of the attack grows more than linearly with the number of random ASLR low bits that gets into the BTB hashfn input. This is an optimistic guess though. Overall for on-prem cloud usages where no random malicious code can run in the CPU by design, and this is only a post-exploitation robustness issue, it doesn't seem a major concern if STIBP is disabled if pid namespaces and ASLR have been fully leveraged by default in Kubernetes containers. I'm curious to hear other people opinion on this too however. Downstream we always provided ibrs_enabled=2/3 which already implies STIBP implicitly enabled at all times too, and that unlike STIBP alone, also protects against guest attack on host userland too within the same context. It's tunable at runtime. It's not enabled by default for similar considerations as above for STIBP. I think it's good to give the users the choice to be 100% secure against everything as an opt-in (ideally requiring a reboot, that actually helps the evaluation of the performance impact, which is obviously workload dependent too). As an alternative to STIBP it would also be possible to alter the scheduler so it never runs different processes in different siblings of the same core, unless they can ptrace each other (same exact ptrace check as the one to decide if to run IBPB to protect against the context switch spectrev2 attack, except it needs to be checked in both directions here). This way you could still have zero penalty for all kernel builds and all threaded programs etc.. while still retaining full security against the HT attack (and IBPB takes care of the rest with the same ptrace check). With several containerized single threaded workloads it would be slower than STIBP though by leaving all siblings idle. However we could also let any process under pid namespace to run along with any other processes under pid namespaces even if they cannot ptrace each other to take care of that detail. Not sure if it's worth it, but it remains a possibility that may perform better than STIBP. It would also take care in general of cache attacks on non-constant time algorithms etc.. not just spectrev2 HT attack. Thanks, Andrea