On Tue, Nov 17, 2020 at 06:19:48PM -0500, Joel Fernandes (Google) wrote:
> Core-scheduling prevents hyperthreads in usermode from attacking each
> other, but it does not do anything about one of the hyperthreads
> entering the kernel for any reason. This leaves the door open for MDS
> and L1TF attacks with concurrent execution sequences between
> hyperthreads.
> 
> This patch therefore adds support for protecting all syscall and IRQ
> kernel mode entries. Care is taken to track the outermost usermode exit
> and entry using per-cpu counters. In cases where one of the hyperthreads
> enter the kernel, no additional IPIs are sent. Further, IPIs are avoided
> when not needed - example: idle and non-cookie HTs do not need to be
> forced into kernel mode.
> 
> More information about attacks:
> For MDS, it is possible for syscalls, IRQ and softirq handlers to leak
> data to either host or guest attackers. For L1TF, it is possible to leak
> to guest attackers. There is no possible mitigation involving flushing
> of buffers to avoid this since the execution of attacker and victims
> happen concurrently on 2 or more HTs.

>  .../admin-guide/kernel-parameters.txt         |  11 +
>  include/linux/entry-common.h                  |  12 +-
>  include/linux/sched.h                         |  12 +
>  kernel/entry/common.c                         |  28 +-
>  kernel/sched/core.c                           | 241 ++++++++++++++++++
>  kernel/sched/sched.h                          |   3 +
>  6 files changed, 304 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index bd1a5b87a5e2..b185c6ed4aba 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4678,6 +4678,17 @@
>  
>       sbni=           [NET] Granch SBNI12 leased line adapter
>  
> +     sched_core_protect_kernel=
> +                     [SCHED_CORE] Pause SMT siblings of a core running in
> +                     user mode, if at least one of the siblings of the core
> +                     is running in kernel mode. This is to guarantee that
> +                     kernel data is not leaked to tasks which are not trusted
> +                     by the kernel. A value of 0 disables protection, 1
> +                     enables protection. The default is 1. Note that 
> protection
> +                     depends on the arch defining the _TIF_UNSAFE_RET flag.
> +                     Further, for protecting VMEXIT, arch needs to call
> +                     KVM entry/exit hooks.
> +
>       sched_debug     [KNL] Enables verbose scheduler debug messages.
>  
>       schedstats=     [KNL,X86] Enable or disable scheduled statistics.

So I don't like the parameter name, it's too long. Also I don't like it
because its a boolean.

You're adding syscall,irq,kvm under a single knob where they're all due
to different flavours of broken. Different hardware might want/need
different combinations.

Hardware without MDS but with L1TF wouldn't need the syscall hook, but
you're not givng a choice here. And this is generic code, you can't
assume stuff like this.

Reply via email to