Package: release.debian.org
Severity: normal
User: release.debian....@packages.debian.org
Usertags: unblock
X-Debbugs-Cc: car...@debian.org,k...@debian.org,b...@debian.org
Control: tags -1 + d-i

Dear release team, Cyril

Please unblock package linux. This is actually a pre-approval request,
knowing the time is *very* tight now for getting fixes into bullseye.
Two new CVEs were disclosed earlier this week[1],
CVE-2021-34556,CVE-2021-35477, where the BPF protection against
Speculative Store Bypass can be bypassed and taken advantage off for
disclosing arbitrary kernel memory.

 [1] https://www.openwall.com/lists/oss-security/2021/08/01/3

I cherry-picked the related commits which were queued for the next
5.10.y version to be released. Again those two CVEs take advantage of
unprivileged BPF programs, this has been recently widely discussed,
eg. in [2], and asked in Debian to be disabled by default[3] which is
sensible to do.

 [2] https://lwn.net/Articles/860597/
 [3] https://bugs.debian.org/990411

Upstream added in 5.13-rc4 a new kconfig know to diable unprivilged
bpf by default, but without making it irreversible. I cherry-picked
this commit as well, and set BPF_UNPRIV_DEFAULT_OFF, closing #990411.

Attached is a filtered debdiff (excluding debian/rules.gen,
debian/config.defines.dump). The changelog is not yet finalized, will
be put from UNRELEASED -> unstable accordingly.

Would you agree on such a very short timed upload still to be
targetting for bullseye?

Regards,
Salvatore
diff -Nru linux-5.10.46/debian/changelog linux-5.10.46/debian/changelog
--- linux-5.10.46/debian/changelog      2021-07-28 07:55:40.000000000 +0200
+++ linux-5.10.46/debian/changelog      2021-08-02 12:36:15.000000000 +0200
@@ -1,3 +1,16 @@
+linux (5.10.46-4) UNRELEASED; urgency=medium
+
+  * bpf: Introduce BPF nospec instruction for mitigating Spectre v4
+    (CVE-2021-34556, CVE-2021-35477)
+  * bpf: Fix leakage due to insufficient speculative store bypass mitigation
+    (CVE-2021-34556, CVE-2021-35477)
+  * bpf: Remove superfluous aux sanitation on subprog rejection
+  * Ignore ABI changes for bpf_offload_dev_create and bpf_verifier_log_write
+  * bpf: Add kconfig knob for disabling unpriv bpf by default
+  * init: Enable BPF_UNPRIV_DEFAULT_OFF (Closes: #990411)
+
+ -- Salvatore Bonaccorso <car...@debian.org>  Mon, 02 Aug 2021 12:36:15 +0200
+
 linux (5.10.46-3) unstable; urgency=medium
 
   * [armhf] Add mdio-aspeed to nic-modules.
diff -Nru linux-5.10.46/debian/config/config linux-5.10.46/debian/config/config
--- linux-5.10.46/debian/config/config  2021-07-26 22:01:29.000000000 +0200
+++ linux-5.10.46/debian/config/config  2021-08-02 12:36:15.000000000 +0200
@@ -6425,6 +6425,10 @@
 CONFIG_BPF_LSM=y
 CONFIG_BPF_SYSCALL=y
 # CONFIG_BPF_JIT_ALWAYS_ON is not set
+# Debian backport of b24abcff918a ("bpf, kconfig: Add consolidated menu entry
+# for bpf with core options") in 5.13-rc4 adds the configuration option to
+# init/Kconfig and needs to be moved once rebasing to 5.13-rc4 and later.
+CONFIG_BPF_UNPRIV_DEFAULT_OFF=y
 CONFIG_USERFAULTFD=y
 CONFIG_RSEQ=y
 # CONFIG_DEBUG_RSEQ is not set
diff -Nru linux-5.10.46/debian/config/defines 
linux-5.10.46/debian/config/defines
--- linux-5.10.46/debian/config/defines 2021-07-26 22:01:29.000000000 +0200
+++ linux-5.10.46/debian/config/defines 2021-08-02 12:36:15.000000000 +0200
@@ -4,6 +4,8 @@
  __cpuhp_*
  __udp_gso_segment
  bpf_analyzer
+ bpf_offload_dev_create
+ bpf_verifier_log_write
  cxl_*
  dax_flush
  ieee80211_nullfunc_get
diff -Nru linux-5.10.46/debian/control.md5sum 
linux-5.10.46/debian/control.md5sum
--- linux-5.10.46/debian/control.md5sum 2021-07-28 07:55:40.000000000 +0200
+++ linux-5.10.46/debian/control.md5sum 2021-08-02 12:36:15.000000000 +0200
@@ -1,5 +1,5 @@
 a46eb172db472ccbe5364b1de7eeb2a0  debian/bin/gencontrol.py
-70f7a3c76de436ac37c57d91d4461d9a  debian/build/version-info
+d226894df595474dd82b3dcb18e67c6d  debian/build/version-info
 fe4456d48e3218fb8980c8577d03a7ae  debian/templates/control.config.in
 9509e39d8e60906ebd5ee4a8b0355d25  debian/templates/control.docs.in
 358db3af53a223fe60ae89c7a481609f  debian/templates/control.docs.meta.in
@@ -38,7 +38,7 @@
 381bc892fd36ef7ea5327f649b99cb98  
debian/templates/sourcebin.meta.maintscript.in
 814dda166c7e3ef02e6e259e805ac66a  debian/templates/tests-control.image.in
 33d71bfd398d2f9b3bc5c0193b67d17e  debian/templates/tests-control.main.in
-0e6309e89ee090e9c132f0e1f869c8ef  debian/config/defines
+49e9b3a63832ab55e378afc715f95789  debian/config/defines
 59a811890d2e7129bec940075850f11f  debian/config/alpha/defines
 026ce5cdad7814c28f4fd87589786719  debian/config/amd64/defines
 44bff3917069a99eeb20ceff24609dda  debian/config/arm64/defines
diff -Nru 
linux-5.10.46/debian/patches/bugfix/all/bpf-Add-kconfig-knob-for-disabling-unpriv-bpf-by-def.patch
 
linux-5.10.46/debian/patches/bugfix/all/bpf-Add-kconfig-knob-for-disabling-unpriv-bpf-by-def.patch
--- 
linux-5.10.46/debian/patches/bugfix/all/bpf-Add-kconfig-knob-for-disabling-unpriv-bpf-by-def.patch
  1970-01-01 01:00:00.000000000 +0100
+++ 
linux-5.10.46/debian/patches/bugfix/all/bpf-Add-kconfig-knob-for-disabling-unpriv-bpf-by-def.patch
  2021-08-02 12:36:15.000000000 +0200
@@ -0,0 +1,134 @@
+From: Daniel Borkmann <dan...@iogearbox.net>
+Date: Tue, 11 May 2021 22:35:17 +0200
+Subject: bpf: Add kconfig knob for disabling unpriv bpf by default
+Origin: https://git.kernel.org/linus/08389d888287c3823f80b0216766b71e17f0aba5
+
+Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
+If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.
+
+This still allows a transition of 2 -> {0,1} through an admin. Similarly,
+this also still keeps 1 -> {1} behavior intact, so that once set to permanently
+disabled, it cannot be undone aside from a reboot.
+
+We've also added extra2 with max of 2 for the procfs handler, so that an admin
+still has a chance to toggle between 0 <-> 2.
+
+Either way, as an additional alternative, applications can make use of CAP_BPF
+that we added a while ago.
+
+Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
+Signed-off-by: Alexei Starovoitov <a...@kernel.org>
+Link: 
https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.dan...@iogearbox.net
+[Salvatore Bonaccorso: Backport to 5.10.y: Filename change from
+kernel/bpf/Kconfig back to init/Kconfig]
+---
+ Documentation/admin-guide/sysctl/kernel.rst | 17 +++++++++---
+ kernel/bpf/Kconfig                          | 10 +++++++
+ kernel/bpf/syscall.c                        |  3 ++-
+ kernel/sysctl.c                             | 29 +++++++++++++++++----
+ 4 files changed, 50 insertions(+), 9 deletions(-)
+
+--- a/Documentation/admin-guide/sysctl/kernel.rst
++++ b/Documentation/admin-guide/sysctl/kernel.rst
+@@ -1457,11 +1457,22 @@ unprivileged_bpf_disabled
+ =========================
+ 
+ Writing 1 to this entry will disable unprivileged calls to ``bpf()``;
+-once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` will return
+-``-EPERM``.
++once disabled, calling ``bpf()`` without ``CAP_SYS_ADMIN`` or ``CAP_BPF``
++will return ``-EPERM``. Once set to 1, this can't be cleared from the
++running kernel anymore.
+ 
+-Once set, this can't be cleared.
++Writing 2 to this entry will also disable unprivileged calls to ``bpf()``,
++however, an admin can still change this setting later on, if needed, by
++writing 0 or 1 to this entry.
+ 
++If ``BPF_UNPRIV_DEFAULT_OFF`` is enabled in the kernel config, then this
++entry will default to 2 instead of 0.
++
++= =============================================================
++0 Unprivileged calls to ``bpf()`` are enabled
++1 Unprivileged calls to ``bpf()`` are disabled without recovery
++2 Unprivileged calls to ``bpf()`` are disabled
++= =============================================================
+ 
+ watchdog
+ ========
+--- a/init/Kconfig
++++ b/init/Kconfig
+@@ -1722,6 +1722,16 @@ config BPF_JIT_DEFAULT_ON
+       def_bool ARCH_WANT_DEFAULT_BPF_JIT || BPF_JIT_ALWAYS_ON
+       depends on HAVE_EBPF_JIT && BPF_JIT
+ 
++config BPF_UNPRIV_DEFAULT_OFF
++      bool "Disable unprivileged BPF by default"
++      depends on BPF_SYSCALL
++      help
++        Disables unprivileged BPF by default by setting the corresponding
++        /proc/sys/kernel/unprivileged_bpf_disabled knob to 2. An admin can
++        still reenable it by setting it to 0 later on, or permanently
++        disable it by setting it to 1 (from which no other transition to
++        0 is possible anymore).
++
+ source "kernel/bpf/preload/Kconfig"
+ 
+ config USERFAULTFD
+--- a/kernel/bpf/syscall.c
++++ b/kernel/bpf/syscall.c
+@@ -50,7 +50,8 @@ static DEFINE_SPINLOCK(map_idr_lock);
+ static DEFINE_IDR(link_idr);
+ static DEFINE_SPINLOCK(link_idr_lock);
+ 
+-int sysctl_unprivileged_bpf_disabled __read_mostly;
++int sysctl_unprivileged_bpf_disabled __read_mostly =
++      IS_BUILTIN(CONFIG_BPF_UNPRIV_DEFAULT_OFF) ? 2 : 0;
+ 
+ static const struct bpf_map_ops * const bpf_map_types[] = {
+ #define BPF_PROG_TYPE(_id, _name, prog_ctx_type, kern_ctx_type)
+--- a/kernel/sysctl.c
++++ b/kernel/sysctl.c
+@@ -237,7 +237,27 @@ static int bpf_stats_handler(struct ctl_
+       mutex_unlock(&bpf_stats_enabled_mutex);
+       return ret;
+ }
+-#endif
++
++static int bpf_unpriv_handler(struct ctl_table *table, int write,
++                            void *buffer, size_t *lenp, loff_t *ppos)
++{
++      int ret, unpriv_enable = *(int *)table->data;
++      bool locked_state = unpriv_enable == 1;
++      struct ctl_table tmp = *table;
++
++      if (write && !capable(CAP_SYS_ADMIN))
++              return -EPERM;
++
++      tmp.data = &unpriv_enable;
++      ret = proc_dointvec_minmax(&tmp, write, buffer, lenp, ppos);
++      if (write && !ret) {
++              if (locked_state && unpriv_enable != 1)
++                      return -EPERM;
++              *(int *)table->data = unpriv_enable;
++      }
++      return ret;
++}
++#endif /* CONFIG_BPF_SYSCALL && CONFIG_SYSCTL */
+ 
+ /*
+  * /proc/sys support
+@@ -2639,10 +2659,9 @@ static struct ctl_table kern_table[] = {
+               .data           = &sysctl_unprivileged_bpf_disabled,
+               .maxlen         = sizeof(sysctl_unprivileged_bpf_disabled),
+               .mode           = 0644,
+-              /* only handle a transition from default "0" to "1" */
+-              .proc_handler   = proc_dointvec_minmax,
+-              .extra1         = SYSCTL_ONE,
+-              .extra2         = SYSCTL_ONE,
++              .proc_handler   = bpf_unpriv_handler,
++              .extra1         = SYSCTL_ZERO,
++              .extra2         = &two,
+       },
+       {
+               .procname       = "bpf_stats_enabled",
diff -Nru 
linux-5.10.46/debian/patches/bugfix/all/bpf-fix-leakage-due-to-insufficient-speculative-stor.patch
 
linux-5.10.46/debian/patches/bugfix/all/bpf-fix-leakage-due-to-insufficient-speculative-stor.patch
--- 
linux-5.10.46/debian/patches/bugfix/all/bpf-fix-leakage-due-to-insufficient-speculative-stor.patch
  1970-01-01 01:00:00.000000000 +0100
+++ 
linux-5.10.46/debian/patches/bugfix/all/bpf-fix-leakage-due-to-insufficient-speculative-stor.patch
  2021-08-02 12:36:15.000000000 +0200
@@ -0,0 +1,452 @@
+From 7e0f6483e208dc514244e383e74ff3b15bd638df Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sas...@kernel.org>
+Date: Tue, 13 Jul 2021 08:18:31 +0000
+Subject: bpf: Fix leakage due to insufficient speculative store bypass
+ mitigation
+
+From: Daniel Borkmann <dan...@iogearbox.net>
+
+[ Upstream commit 2039f26f3aca5b0e419b98f65dd36481337b86ee ]
+
+Spectre v4 gadgets make use of memory disambiguation, which is a set of
+techniques that execute memory access instructions, that is, loads and
+stores, out of program order; Intel's optimization manual, section 2.4.4.5:
+
+  A load instruction micro-op may depend on a preceding store. Many
+  microarchitectures block loads until all preceding store addresses are
+  known. The memory disambiguator predicts which loads will not depend on
+  any previous stores. When the disambiguator predicts that a load does
+  not have such a dependency, the load takes its data from the L1 data
+  cache. Eventually, the prediction is verified. If an actual conflict is
+  detected, the load and all succeeding instructions are re-executed.
+
+af86ca4e3088 ("bpf: Prevent memory disambiguation attack") tried to mitigate
+this attack by sanitizing the memory locations through preemptive "fast"
+(low latency) stores of zero prior to the actual "slow" (high latency) store
+of a pointer value such that upon dependency misprediction the CPU then
+speculatively executes the load of the pointer value and retrieves the zero
+value instead of the attacker controlled scalar value previously stored at
+that location, meaning, subsequent access in the speculative domain is then
+redirected to the "zero page".
+
+The sanitized preemptive store of zero prior to the actual "slow" store is
+done through a simple ST instruction based on r10 (frame pointer) with
+relative offset to the stack location that the verifier has been tracking
+on the original used register for STX, which does not have to be r10. Thus,
+there are no memory dependencies for this store, since it's only using r10
+and immediate constant of zero; hence af86ca4e3088 /assumed/ a low latency
+operation.
+
+However, a recent attack demonstrated that this mitigation is not sufficient
+since the preemptive store of zero could also be turned into a "slow" store
+and is thus bypassed as well:
+
+  [...]
+  // r2 = oob address (e.g. scalar)
+  // r7 = pointer to map value
+  31: (7b) *(u64 *)(r10 -16) = r2
+  // r9 will remain "fast" register, r10 will become "slow" register below
+  32: (bf) r9 = r10
+  // JIT maps BPF reg to x86 reg:
+  //  r9  -> r15 (callee saved)
+  //  r10 -> rbp
+  // train store forward prediction to break dependency link between both r9
+  // and r10 by evicting them from the predictor's LRU table.
+  33: (61) r0 = *(u32 *)(r7 +24576)
+  34: (63) *(u32 *)(r7 +29696) = r0
+  35: (61) r0 = *(u32 *)(r7 +24580)
+  36: (63) *(u32 *)(r7 +29700) = r0
+  37: (61) r0 = *(u32 *)(r7 +24584)
+  38: (63) *(u32 *)(r7 +29704) = r0
+  39: (61) r0 = *(u32 *)(r7 +24588)
+  40: (63) *(u32 *)(r7 +29708) = r0
+  [...]
+  543: (61) r0 = *(u32 *)(r7 +25596)
+  544: (63) *(u32 *)(r7 +30716) = r0
+  // prepare call to bpf_ringbuf_output() helper. the latter will cause rbp
+  // to spill to stack memory while r13/r14/r15 (all callee saved regs) remain
+  // in hardware registers. rbp becomes slow due to push/pop latency. below is
+  // disasm of bpf_ringbuf_output() helper for better visual context:
+  //
+  // ffffffff8117ee20: 41 54                 push   r12
+  // ffffffff8117ee22: 55                    push   rbp
+  // ffffffff8117ee23: 53                    push   rbx
+  // ffffffff8117ee24: 48 f7 c1 fc ff ff ff  test   rcx,0xfffffffffffffffc
+  // ffffffff8117ee2b: 0f 85 af 00 00 00     jne    ffffffff8117eee0 <-- jump 
taken
+  // [...]
+  // ffffffff8117eee0: 49 c7 c4 ea ff ff ff  mov    r12,0xffffffffffffffea
+  // ffffffff8117eee7: 5b                    pop    rbx
+  // ffffffff8117eee8: 5d                    pop    rbp
+  // ffffffff8117eee9: 4c 89 e0              mov    rax,r12
+  // ffffffff8117eeec: 41 5c                 pop    r12
+  // ffffffff8117eeee: c3                    ret
+  545: (18) r1 = map[id:4]
+  547: (bf) r2 = r7
+  548: (b7) r3 = 0
+  549: (b7) r4 = 4
+  550: (85) call bpf_ringbuf_output#194288
+  // instruction 551 inserted by verifier    \
+  551: (7a) *(u64 *)(r10 -16) = 0            | /both/ are now slow stores here
+  // storing map value pointer r7 at fp-16   | since value of r10 is "slow".
+  552: (7b) *(u64 *)(r10 -16) = r7           /
+  // following "fast" read to the same memory location, but due to dependency
+  // misprediction it will speculatively execute before insn 551/552 completes.
+  553: (79) r2 = *(u64 *)(r9 -16)
+  // in speculative domain contains attacker controlled r2. in non-speculative
+  // domain this contains r7, and thus accesses r7 +0 below.
+  554: (71) r3 = *(u8 *)(r2 +0)
+  // leak r3
+
+As can be seen, the current speculative store bypass mitigation which the
+verifier inserts at line 551 is insufficient since /both/, the write of
+the zero sanitation as well as the map value pointer are a high latency
+instruction due to prior memory access via push/pop of r10 (rbp) in contrast
+to the low latency read in line 553 as r9 (r15) which stays in hardware
+registers. Thus, architecturally, fp-16 is r7, however, microarchitecturally,
+fp-16 can still be r2.
+
+Initial thoughts to address this issue was to track spilled pointer loads
+from stack and enforce their load via LDX through r10 as well so that /both/
+the preemptive store of zero /as well as/ the load use the /same/ register
+such that a dependency is created between the store and load. However, this
+option is not sufficient either since it can be bypassed as well under
+speculation. An updated attack with pointer spill/fills now _all_ based on
+r10 would look as follows:
+
+  [...]
+  // r2 = oob address (e.g. scalar)
+  // r7 = pointer to map value
+  [...]
+  // longer store forward prediction training sequence than before.
+  2062: (61) r0 = *(u32 *)(r7 +25588)
+  2063: (63) *(u32 *)(r7 +30708) = r0
+  2064: (61) r0 = *(u32 *)(r7 +25592)
+  2065: (63) *(u32 *)(r7 +30712) = r0
+  2066: (61) r0 = *(u32 *)(r7 +25596)
+  2067: (63) *(u32 *)(r7 +30716) = r0
+  // store the speculative load address (scalar) this time after the store
+  // forward prediction training.
+  2068: (7b) *(u64 *)(r10 -16) = r2
+  // preoccupy the CPU store port by running sequence of dummy stores.
+  2069: (63) *(u32 *)(r7 +29696) = r0
+  2070: (63) *(u32 *)(r7 +29700) = r0
+  2071: (63) *(u32 *)(r7 +29704) = r0
+  2072: (63) *(u32 *)(r7 +29708) = r0
+  2073: (63) *(u32 *)(r7 +29712) = r0
+  2074: (63) *(u32 *)(r7 +29716) = r0
+  2075: (63) *(u32 *)(r7 +29720) = r0
+  2076: (63) *(u32 *)(r7 +29724) = r0
+  2077: (63) *(u32 *)(r7 +29728) = r0
+  2078: (63) *(u32 *)(r7 +29732) = r0
+  2079: (63) *(u32 *)(r7 +29736) = r0
+  2080: (63) *(u32 *)(r7 +29740) = r0
+  2081: (63) *(u32 *)(r7 +29744) = r0
+  2082: (63) *(u32 *)(r7 +29748) = r0
+  2083: (63) *(u32 *)(r7 +29752) = r0
+  2084: (63) *(u32 *)(r7 +29756) = r0
+  2085: (63) *(u32 *)(r7 +29760) = r0
+  2086: (63) *(u32 *)(r7 +29764) = r0
+  2087: (63) *(u32 *)(r7 +29768) = r0
+  2088: (63) *(u32 *)(r7 +29772) = r0
+  2089: (63) *(u32 *)(r7 +29776) = r0
+  2090: (63) *(u32 *)(r7 +29780) = r0
+  2091: (63) *(u32 *)(r7 +29784) = r0
+  2092: (63) *(u32 *)(r7 +29788) = r0
+  2093: (63) *(u32 *)(r7 +29792) = r0
+  2094: (63) *(u32 *)(r7 +29796) = r0
+  2095: (63) *(u32 *)(r7 +29800) = r0
+  2096: (63) *(u32 *)(r7 +29804) = r0
+  2097: (63) *(u32 *)(r7 +29808) = r0
+  2098: (63) *(u32 *)(r7 +29812) = r0
+  // overwrite scalar with dummy pointer; same as before, also including the
+  // sanitation store with 0 from the current mitigation by the verifier.
+  2099: (7a) *(u64 *)(r10 -16) = 0         | /both/ are now slow stores here
+  2100: (7b) *(u64 *)(r10 -16) = r7        | since store unit is still busy.
+  // load from stack intended to bypass stores.
+  2101: (79) r2 = *(u64 *)(r10 -16)
+  2102: (71) r3 = *(u8 *)(r2 +0)
+  // leak r3
+  [...]
+
+Looking at the CPU microarchitecture, the scheduler might issue loads (such
+as seen in line 2101) before stores (line 2099,2100) because the load execution
+units become available while the store execution unit is still busy with the
+sequence of dummy stores (line 2069-2098). And so the load may use the prior
+stored scalar from r2 at address r10 -16 for speculation. The updated attack
+may work less reliable on CPU microarchitectures where loads and stores share
+execution resources.
+
+This concludes that the sanitizing with zero stores from af86ca4e3088 ("bpf:
+Prevent memory disambiguation attack") is insufficient. Moreover, the detection
+of stack reuse from af86ca4e3088 where previously data (STACK_MISC) has been
+written to a given stack slot where a pointer value is now to be stored does
+not have sufficient coverage as precondition for the mitigation either; for
+several reasons outlined as follows:
+
+ 1) Stack content from prior program runs could still be preserved and is
+    therefore not "random", best example is to split a speculative store
+    bypass attack between tail calls, program A would prepare and store the
+    oob address at a given stack slot and then tail call into program B which
+    does the "slow" store of a pointer to the stack with subsequent "fast"
+    read. From program B PoV such stack slot type is STACK_INVALID, and
+    therefore also must be subject to mitigation.
+
+ 2) The STACK_SPILL must not be coupled to 
register_is_const(&stack->spilled_ptr)
+    condition, for example, the previous content of that memory location could
+    also be a pointer to map or map value. Without the fix, a speculative
+    store bypass is not mitigated in such precondition and can then lead to
+    a type confusion in the speculative domain leaking kernel memory near
+    these pointer types.
+
+While brainstorming on various alternative mitigation possibilities, we also
+stumbled upon a retrospective from Chrome developers [0]:
+
+  [...] For variant 4, we implemented a mitigation to zero the unused memory
+  of the heap prior to allocation, which cost about 1% when done concurrently
+  and 4% for scavenging. Variant 4 defeats everything we could think of. We
+  explored more mitigations for variant 4 but the threat proved to be more
+  pervasive and dangerous than we anticipated. For example, stack slots used
+  by the register allocator in the optimizing compiler could be subject to
+  type confusion, leading to pointer crafting. Mitigating type confusion for
+  stack slots alone would have required a complete redesign of the backend of
+  the optimizing compiler, perhaps man years of work, without a guarantee of
+  completeness. [...]
+
+From BPF side, the problem space is reduced, however, options are rather
+limited. One idea that has been explored was to xor-obfuscate pointer spills
+to the BPF stack:
+
+  [...]
+  // preoccupy the CPU store port by running sequence of dummy stores.
+  [...]
+  2106: (63) *(u32 *)(r7 +29796) = r0
+  2107: (63) *(u32 *)(r7 +29800) = r0
+  2108: (63) *(u32 *)(r7 +29804) = r0
+  2109: (63) *(u32 *)(r7 +29808) = r0
+  2110: (63) *(u32 *)(r7 +29812) = r0
+  // overwrite scalar with dummy pointer; xored with random 'secret' value
+  // of 943576462 before store ...
+  2111: (b4) w11 = 943576462
+  2112: (af) r11 ^= r7
+  2113: (7b) *(u64 *)(r10 -16) = r11
+  2114: (79) r11 = *(u64 *)(r10 -16)
+  2115: (b4) w2 = 943576462
+  2116: (af) r2 ^= r11
+  // ... and restored with the same 'secret' value with the help of AX reg.
+  2117: (71) r3 = *(u8 *)(r2 +0)
+  [...]
+
+While the above would not prevent speculation, it would make data leakage
+infeasible by directing it to random locations. In order to be effective
+and prevent type confusion under speculation, such random secret would have
+to be regenerated for each store. The additional complexity involved for a
+tracking mechanism that prevents jumps such that restoring spilled pointers
+would not get corrupted is not worth the gain for unprivileged. Hence, the
+fix in here eventually opted for emitting a non-public BPF_ST | BPF_NOSPEC
+instruction which the x86 JIT translates into a lfence opcode. Inserting the
+latter in between the store and load instruction is one of the mitigations
+options [1]. The x86 instruction manual notes:
+
+  [...] An LFENCE that follows an instruction that stores to memory might
+  complete before the data being stored have become globally visible. [...]
+
+The latter meaning that the preceding store instruction finished execution
+and the store is at minimum guaranteed to be in the CPU's store queue, but
+it's not guaranteed to be in that CPU's L1 cache at that point (globally
+visible). The latter would only be guaranteed via sfence. So the load which
+is guaranteed to execute after the lfence for that local CPU would have to
+rely on store-to-load forwarding. [2], in section 2.3 on store buffers says:
+
+  [...] For every store operation that is added to the ROB, an entry is
+  allocated in the store buffer. This entry requires both the virtual and
+  physical address of the target. Only if there is no free entry in the store
+  buffer, the frontend stalls until there is an empty slot available in the
+  store buffer again. Otherwise, the CPU can immediately continue adding
+  subsequent instructions to the ROB and execute them out of order. On Intel
+  CPUs, the store buffer has up to 56 entries. [...]
+
+One small upside on the fix is that it lifts constraints from af86ca4e3088
+where the sanitize_stack_off relative to r10 must be the same when coming
+from different paths. The BPF_ST | BPF_NOSPEC gets emitted after a BPF_STX
+or BPF_ST instruction. This happens either when we store a pointer or data
+value to the BPF stack for the first time, or upon later pointer spills.
+The former needs to be enforced since otherwise stale stack data could be
+leaked under speculation as outlined earlier. For non-x86 JITs the BPF_ST |
+BPF_NOSPEC mapping is currently optimized away, but others could emit a
+speculation barrier as well if necessary. For real-world unprivileged
+programs e.g. generated by LLVM, pointer spill/fill is only generated upon
+register pressure and LLVM only tries to do that for pointers which are not
+used often. The program main impact will be the initial BPF_ST | BPF_NOSPEC
+sanitation for the STACK_INVALID case when the first write to a stack slot
+occurs e.g. upon map lookup. In future we might refine ways to mitigate
+the latter cost.
+
+  [0] https://arxiv.org/pdf/1902.05178.pdf
+  [1] 
https://msrc-blog.microsoft.com/2018/05/21/analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639/
+  [2] https://arxiv.org/pdf/1905.05725.pdf
+
+Fixes: af86ca4e3088 ("bpf: Prevent memory disambiguation attack")
+Fixes: f7cf25b2026d ("bpf: track spill/fill of constants")
+Co-developed-by: Piotr Krysiuk <piot...@gmail.com>
+Co-developed-by: Benedict Schlueter <benedict.schlue...@rub.de>
+Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
+Signed-off-by: Piotr Krysiuk <piot...@gmail.com>
+Signed-off-by: Benedict Schlueter <benedict.schlue...@rub.de>
+Acked-by: Alexei Starovoitov <a...@kernel.org>
+Signed-off-by: Sasha Levin <sas...@kernel.org>
+---
+ include/linux/bpf_verifier.h |  2 +-
+ kernel/bpf/verifier.c        | 87 +++++++++++++-----------------------
+ 2 files changed, 33 insertions(+), 56 deletions(-)
+
+diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
+index 2739a6431b9e..3d6fb346dc3b 100644
+--- a/include/linux/bpf_verifier.h
++++ b/include/linux/bpf_verifier.h
+@@ -319,8 +319,8 @@ struct bpf_insn_aux_data {
+       };
+       u64 map_key_state; /* constant (32 bit) key tracking for maps */
+       int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
+-      int sanitize_stack_off; /* stack slot to be cleared */
+       u32 seen; /* this insn was processed by the verifier at env->pass_cnt */
++      bool sanitize_stack_spill; /* subject to Spectre v4 sanitation */
+       bool zext_dst; /* this insn zero extends dst reg */
+       u8 alu_state; /* used in combination with alu_limit */
+ 
+diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
+index 36bc34fce623..e038d672200e 100644
+--- a/kernel/bpf/verifier.c
++++ b/kernel/bpf/verifier.c
+@@ -2297,6 +2297,19 @@ static int check_stack_write_fixed_off(struct 
bpf_verifier_env *env,
+       cur = env->cur_state->frame[env->cur_state->curframe];
+       if (value_regno >= 0)
+               reg = &cur->regs[value_regno];
++      if (!env->bypass_spec_v4) {
++              bool sanitize = reg && is_spillable_regtype(reg->type);
++
++              for (i = 0; i < size; i++) {
++                      if (state->stack[spi].slot_type[i] == STACK_INVALID) {
++                              sanitize = true;
++                              break;
++                      }
++              }
++
++              if (sanitize)
++                      env->insn_aux_data[insn_idx].sanitize_stack_spill = 
true;
++      }
+ 
+       if (reg && size == BPF_REG_SIZE && register_is_bounded(reg) &&
+           !register_is_null(reg) && env->bpf_capable) {
+@@ -2319,47 +2332,10 @@ static int check_stack_write_fixed_off(struct 
bpf_verifier_env *env,
+                       verbose(env, "invalid size of register spill\n");
+                       return -EACCES;
+               }
+-
+               if (state != cur && reg->type == PTR_TO_STACK) {
+                       verbose(env, "cannot spill pointers to stack into stack 
frame of the caller\n");
+                       return -EINVAL;
+               }
+-
+-              if (!env->bypass_spec_v4) {
+-                      bool sanitize = false;
+-
+-                      if (state->stack[spi].slot_type[0] == STACK_SPILL &&
+-                          register_is_const(&state->stack[spi].spilled_ptr))
+-                              sanitize = true;
+-                      for (i = 0; i < BPF_REG_SIZE; i++)
+-                              if (state->stack[spi].slot_type[i] == 
STACK_MISC) {
+-                                      sanitize = true;
+-                                      break;
+-                              }
+-                      if (sanitize) {
+-                              int *poff = 
&env->insn_aux_data[insn_idx].sanitize_stack_off;
+-                              int soff = (-spi - 1) * BPF_REG_SIZE;
+-
+-                              /* detected reuse of integer stack slot with a 
pointer
+-                               * which means either llvm is reusing stack 
slot or
+-                               * an attacker is trying to exploit 
CVE-2018-3639
+-                               * (speculative store bypass)
+-                               * Have to sanitize that slot with preemptive
+-                               * store of zero.
+-                               */
+-                              if (*poff && *poff != soff) {
+-                                      /* disallow programs where single insn 
stores
+-                                       * into two different stack slots, 
since verifier
+-                                       * cannot sanitize them
+-                                       */
+-                                      verbose(env,
+-                                              "insn %d cannot access two 
stack slots fp%d and fp%d",
+-                                              insn_idx, *poff, soff);
+-                                      return -EINVAL;
+-                              }
+-                              *poff = soff;
+-                      }
+-              }
+               save_register_state(state, spi, reg);
+       } else {
+               u8 type = STACK_MISC;
+@@ -10947,35 +10923,33 @@ static int convert_ctx_accesses(struct 
bpf_verifier_env *env)
+ 
+       for (i = 0; i < insn_cnt; i++, insn++) {
+               bpf_convert_ctx_access_t convert_ctx_access;
++              bool ctx_access;
+ 
+               if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) ||
+                   insn->code == (BPF_LDX | BPF_MEM | BPF_H) ||
+                   insn->code == (BPF_LDX | BPF_MEM | BPF_W) ||
+-                  insn->code == (BPF_LDX | BPF_MEM | BPF_DW))
++                  insn->code == (BPF_LDX | BPF_MEM | BPF_DW)) {
+                       type = BPF_READ;
+-              else if (insn->code == (BPF_STX | BPF_MEM | BPF_B) ||
+-                       insn->code == (BPF_STX | BPF_MEM | BPF_H) ||
+-                       insn->code == (BPF_STX | BPF_MEM | BPF_W) ||
+-                       insn->code == (BPF_STX | BPF_MEM | BPF_DW))
++                      ctx_access = true;
++              } else if (insn->code == (BPF_STX | BPF_MEM | BPF_B) ||
++                         insn->code == (BPF_STX | BPF_MEM | BPF_H) ||
++                         insn->code == (BPF_STX | BPF_MEM | BPF_W) ||
++                         insn->code == (BPF_STX | BPF_MEM | BPF_DW) ||
++                         insn->code == (BPF_ST | BPF_MEM | BPF_B) ||
++                         insn->code == (BPF_ST | BPF_MEM | BPF_H) ||
++                         insn->code == (BPF_ST | BPF_MEM | BPF_W) ||
++                         insn->code == (BPF_ST | BPF_MEM | BPF_DW)) {
+                       type = BPF_WRITE;
+-              else
++                      ctx_access = BPF_CLASS(insn->code) == BPF_STX;
++              } else {
+                       continue;
++              }
+ 
+               if (type == BPF_WRITE &&
+-                  env->insn_aux_data[i + delta].sanitize_stack_off) {
++                  env->insn_aux_data[i + delta].sanitize_stack_spill) {
+                       struct bpf_insn patch[] = {
+-                              /* Sanitize suspicious stack slot with zero.
+-                               * There are no memory dependencies for this 
store,
+-                               * since it's only using frame pointer and 
immediate
+-                               * constant of zero
+-                               */
+-                              BPF_ST_MEM(BPF_DW, BPF_REG_FP,
+-                                         env->insn_aux_data[i + 
delta].sanitize_stack_off,
+-                                         0),
+-                              /* the original STX instruction will immediately
+-                               * overwrite the same stack slot with 
appropriate value
+-                               */
+                               *insn,
++                              BPF_ST_NOSPEC(),
+                       };
+ 
+                       cnt = ARRAY_SIZE(patch);
+@@ -10989,6 +10963,9 @@ static int convert_ctx_accesses(struct 
bpf_verifier_env *env)
+                       continue;
+               }
+ 
++              if (!ctx_access)
++                      continue;
++
+               switch (env->insn_aux_data[i + delta].ptr_type) {
+               case PTR_TO_CTX:
+                       if (!ops->convert_ctx_access)
+-- 
+2.30.2
+
diff -Nru 
linux-5.10.46/debian/patches/bugfix/all/bpf-introduce-bpf-nospec-instruction-for-mitigating-.patch
 
linux-5.10.46/debian/patches/bugfix/all/bpf-introduce-bpf-nospec-instruction-for-mitigating-.patch
--- 
linux-5.10.46/debian/patches/bugfix/all/bpf-introduce-bpf-nospec-instruction-for-mitigating-.patch
  1970-01-01 01:00:00.000000000 +0100
+++ 
linux-5.10.46/debian/patches/bugfix/all/bpf-introduce-bpf-nospec-instruction-for-mitigating-.patch
  2021-08-02 12:35:00.000000000 +0200
@@ -0,0 +1,322 @@
+From 4be98754f14316b6ab86ff08b955b892ab146676 Mon Sep 17 00:00:00 2001
+From: Sasha Levin <sas...@kernel.org>
+Date: Tue, 13 Jul 2021 08:18:31 +0000
+Subject: bpf: Introduce BPF nospec instruction for mitigating Spectre v4
+
+From: Daniel Borkmann <dan...@iogearbox.net>
+
+[ Upstream commit f5e81d1117501546b7be050c5fbafa6efd2c722c ]
+
+In case of JITs, each of the JIT backends compiles the BPF nospec instruction
+/either/ to a machine instruction which emits a speculation barrier /or/ to
+/no/ machine instruction in case the underlying architecture is not affected
+by Speculative Store Bypass or has different mitigations in place already.
+
+This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence'
+instruction for mitigation. In case of arm64, we rely on the firmware 
mitigation
+as controlled via the ssbd kernel parameter. Whenever the mitigation is 
enabled,
+it works for all of the kernel code with no need to provide any additional
+instructions here (hence only comment in arm64 JIT). Other archs can follow
+as needed. The BPF nospec instruction is specifically targeting Spectre v4
+since i) we don't use a serialization barrier for the Spectre v1 case, and
+ii) mitigation instructions for v1 and v4 might be different on some archs.
+
+The BPF nospec is required for a future commit, where the BPF verifier does
+annotate intermediate BPF programs with speculation barriers.
+
+Co-developed-by: Piotr Krysiuk <piot...@gmail.com>
+Co-developed-by: Benedict Schlueter <benedict.schlue...@rub.de>
+Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
+Signed-off-by: Piotr Krysiuk <piot...@gmail.com>
+Signed-off-by: Benedict Schlueter <benedict.schlue...@rub.de>
+Acked-by: Alexei Starovoitov <a...@kernel.org>
+Signed-off-by: Sasha Levin <sas...@kernel.org>
+---
+ arch/arm/net/bpf_jit_32.c         |  3 +++
+ arch/arm64/net/bpf_jit_comp.c     | 13 +++++++++++++
+ arch/mips/net/ebpf_jit.c          |  3 +++
+ arch/powerpc/net/bpf_jit_comp64.c |  6 ++++++
+ arch/riscv/net/bpf_jit_comp32.c   |  4 ++++
+ arch/riscv/net/bpf_jit_comp64.c   |  4 ++++
+ arch/s390/net/bpf_jit_comp.c      |  5 +++++
+ arch/sparc/net/bpf_jit_comp_64.c  |  3 +++
+ arch/x86/net/bpf_jit_comp.c       |  7 +++++++
+ arch/x86/net/bpf_jit_comp32.c     |  6 ++++++
+ include/linux/filter.h            | 15 +++++++++++++++
+ kernel/bpf/core.c                 | 19 ++++++++++++++++++-
+ kernel/bpf/disasm.c               | 16 +++++++++-------
+ 13 files changed, 96 insertions(+), 8 deletions(-)
+
+diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
+index 0207b6ea6e8a..ce8b04326352 100644
+--- a/arch/arm/net/bpf_jit_32.c
++++ b/arch/arm/net/bpf_jit_32.c
+@@ -1602,6 +1602,9 @@ static int build_insn(const struct bpf_insn *insn, 
struct jit_ctx *ctx)
+               rn = arm_bpf_get_reg32(src_lo, tmp2[1], ctx);
+               emit_ldx_r(dst, rn, off, ctx, BPF_SIZE(code));
+               break;
++      /* speculation barrier */
++      case BPF_ST | BPF_NOSPEC:
++              break;
+       /* ST: *(size *)(dst + off) = imm */
+       case BPF_ST | BPF_MEM | BPF_W:
+       case BPF_ST | BPF_MEM | BPF_H:
+diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
+index ef9f1d5e989d..345066b8e9fc 100644
+--- a/arch/arm64/net/bpf_jit_comp.c
++++ b/arch/arm64/net/bpf_jit_comp.c
+@@ -829,6 +829,19 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx,
+                       return ret;
+               break;
+ 
++      /* speculation barrier */
++      case BPF_ST | BPF_NOSPEC:
++              /*
++               * Nothing required here.
++               *
++               * In case of arm64, we rely on the firmware mitigation of
++               * Speculative Store Bypass as controlled via the ssbd kernel
++               * parameter. Whenever the mitigation is enabled, it works
++               * for all of the kernel code with no need to provide any
++               * additional instructions.
++               */
++              break;
++
+       /* ST: *(size *)(dst + off) = imm */
+       case BPF_ST | BPF_MEM | BPF_W:
+       case BPF_ST | BPF_MEM | BPF_H:
+diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
+index 561154cbcc40..b31b91e57c34 100644
+--- a/arch/mips/net/ebpf_jit.c
++++ b/arch/mips/net/ebpf_jit.c
+@@ -1355,6 +1355,9 @@ static int build_one_insn(const struct bpf_insn *insn, 
struct jit_ctx *ctx,
+               }
+               break;
+ 
++      case BPF_ST | BPF_NOSPEC: /* speculation barrier */
++              break;
++
+       case BPF_ST | BPF_B | BPF_MEM:
+       case BPF_ST | BPF_H | BPF_MEM:
+       case BPF_ST | BPF_W | BPF_MEM:
+diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
+index 022103c6a201..658ca2bab13c 100644
+--- a/arch/powerpc/net/bpf_jit_comp64.c
++++ b/arch/powerpc/net/bpf_jit_comp64.c
+@@ -646,6 +646,12 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
+                       }
+                       break;
+ 
++              /*
++               * BPF_ST NOSPEC (speculation barrier)
++               */
++              case BPF_ST | BPF_NOSPEC:
++                      break;
++
+               /*
+                * BPF_ST(X)
+                */
+diff --git a/arch/riscv/net/bpf_jit_comp32.c b/arch/riscv/net/bpf_jit_comp32.c
+index 579575f9cdae..f300f93ba645 100644
+--- a/arch/riscv/net/bpf_jit_comp32.c
++++ b/arch/riscv/net/bpf_jit_comp32.c
+@@ -1251,6 +1251,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, 
struct rv_jit_context *ctx,
+                       return -1;
+               break;
+ 
++      /* speculation barrier */
++      case BPF_ST | BPF_NOSPEC:
++              break;
++
+       case BPF_ST | BPF_MEM | BPF_B:
+       case BPF_ST | BPF_MEM | BPF_H:
+       case BPF_ST | BPF_MEM | BPF_W:
+diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
+index 8a56b5293117..c113ae818b14 100644
+--- a/arch/riscv/net/bpf_jit_comp64.c
++++ b/arch/riscv/net/bpf_jit_comp64.c
+@@ -939,6 +939,10 @@ int bpf_jit_emit_insn(const struct bpf_insn *insn, struct 
rv_jit_context *ctx,
+               emit_ld(rd, 0, RV_REG_T1, ctx);
+               break;
+ 
++      /* speculation barrier */
++      case BPF_ST | BPF_NOSPEC:
++              break;
++
+       /* ST: *(size *)(dst + off) = imm */
+       case BPF_ST | BPF_MEM | BPF_B:
+               emit_imm(RV_REG_T1, imm, ctx);
+diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
+index fc44dce59536..dee01d3b23a4 100644
+--- a/arch/s390/net/bpf_jit_comp.c
++++ b/arch/s390/net/bpf_jit_comp.c
+@@ -1153,6 +1153,11 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, 
struct bpf_prog *fp,
+                       break;
+               }
+               break;
++      /*
++       * BPF_NOSPEC (speculation barrier)
++       */
++      case BPF_ST | BPF_NOSPEC:
++              break;
+       /*
+        * BPF_ST(X)
+        */
+diff --git a/arch/sparc/net/bpf_jit_comp_64.c 
b/arch/sparc/net/bpf_jit_comp_64.c
+index 3364e2a00989..fef734473c0f 100644
+--- a/arch/sparc/net/bpf_jit_comp_64.c
++++ b/arch/sparc/net/bpf_jit_comp_64.c
+@@ -1287,6 +1287,9 @@ static int build_insn(const struct bpf_insn *insn, 
struct jit_ctx *ctx)
+                       return 1;
+               break;
+       }
++      /* speculation barrier */
++      case BPF_ST | BPF_NOSPEC:
++              break;
+       /* ST: *(size *)(dst + off) = imm */
+       case BPF_ST | BPF_MEM | BPF_W:
+       case BPF_ST | BPF_MEM | BPF_H:
+diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
+index d5fa77256058..0a962cd6bac1 100644
+--- a/arch/x86/net/bpf_jit_comp.c
++++ b/arch/x86/net/bpf_jit_comp.c
+@@ -1141,6 +1141,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int 
*addrs, u8 *image,
+                       }
+                       break;
+ 
++                      /* speculation barrier */
++              case BPF_ST | BPF_NOSPEC:
++                      if (boot_cpu_has(X86_FEATURE_XMM2))
++                              /* Emit 'lfence' */
++                              EMIT3(0x0F, 0xAE, 0xE8);
++                      break;
++
+                       /* ST: *(u8*)(dst_reg + off) = imm */
+               case BPF_ST | BPF_MEM | BPF_B:
+                       if (is_ereg(dst_reg))
+diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
+index 2cf4d217840d..4bd0f98df700 100644
+--- a/arch/x86/net/bpf_jit_comp32.c
++++ b/arch/x86/net/bpf_jit_comp32.c
+@@ -1705,6 +1705,12 @@ static int do_jit(struct bpf_prog *bpf_prog, int 
*addrs, u8 *image,
+                       i++;
+                       break;
+               }
++              /* speculation barrier */
++              case BPF_ST | BPF_NOSPEC:
++                      if (boot_cpu_has(X86_FEATURE_XMM2))
++                              /* Emit 'lfence' */
++                              EMIT3(0x0F, 0xAE, 0xE8);
++                      break;
+               /* ST: *(u8*)(dst_reg + off) = imm */
+               case BPF_ST | BPF_MEM | BPF_H:
+               case BPF_ST | BPF_MEM | BPF_B:
+diff --git a/include/linux/filter.h b/include/linux/filter.h
+index e2ffa02f9067..822b701c803d 100644
+--- a/include/linux/filter.h
++++ b/include/linux/filter.h
+@@ -72,6 +72,11 @@ struct ctl_table_header;
+ /* unused opcode to mark call to interpreter with arguments */
+ #define BPF_CALL_ARGS 0xe0
+ 
++/* unused opcode to mark speculation barrier for mitigating
++ * Speculative Store Bypass
++ */
++#define BPF_NOSPEC    0xc0
++
+ /* As per nm, we expose JITed images as text (code) section for
+  * kallsyms. That way, tools like perf can find it to match
+  * addresses.
+@@ -372,6 +377,16 @@ static inline bool insn_is_zext(const struct bpf_insn 
*insn)
+               .off   = 0,                                     \
+               .imm   = 0 })
+ 
++/* Speculation barrier */
++
++#define BPF_ST_NOSPEC()                                               \
++      ((struct bpf_insn) {                                    \
++              .code  = BPF_ST | BPF_NOSPEC,                   \
++              .dst_reg = 0,                                   \
++              .src_reg = 0,                                   \
++              .off   = 0,                                     \
++              .imm   = 0 })
++
+ /* Internal classic blocks for direct assignment */
+ 
+ #define __BPF_STMT(CODE, K)                                   \
+diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
+index 75c2d184018a..d12efb2550d3 100644
+--- a/kernel/bpf/core.c
++++ b/kernel/bpf/core.c
+@@ -32,6 +32,8 @@
+ #include <linux/perf_event.h>
+ #include <linux/extable.h>
+ #include <linux/log2.h>
++
++#include <asm/barrier.h>
+ #include <asm/unaligned.h>
+ 
+ /* Registers */
+@@ -1380,6 +1382,7 @@ static u64 ___bpf_prog_run(u64 *regs, const struct 
bpf_insn *insn, u64 *stack)
+               /* Non-UAPI available opcodes. */
+               [BPF_JMP | BPF_CALL_ARGS] = &&JMP_CALL_ARGS,
+               [BPF_JMP | BPF_TAIL_CALL] = &&JMP_TAIL_CALL,
++              [BPF_ST  | BPF_NOSPEC] = &&ST_NOSPEC,
+               [BPF_LDX | BPF_PROBE_MEM | BPF_B] = &&LDX_PROBE_MEM_B,
+               [BPF_LDX | BPF_PROBE_MEM | BPF_H] = &&LDX_PROBE_MEM_H,
+               [BPF_LDX | BPF_PROBE_MEM | BPF_W] = &&LDX_PROBE_MEM_W,
+@@ -1624,7 +1627,21 @@ static u64 ___bpf_prog_run(u64 *regs, const struct 
bpf_insn *insn, u64 *stack)
+       COND_JMP(s, JSGE, >=)
+       COND_JMP(s, JSLE, <=)
+ #undef COND_JMP
+-      /* STX and ST and LDX*/
++      /* ST, STX and LDX*/
++      ST_NOSPEC:
++              /* Speculation barrier for mitigating Speculative Store Bypass.
++               * In case of arm64, we rely on the firmware mitigation as
++               * controlled via the ssbd kernel parameter. Whenever the
++               * mitigation is enabled, it works for all of the kernel code
++               * with no need to provide any additional instructions here.
++               * In case of x86, we use 'lfence' insn for mitigation. We
++               * reuse preexisting logic from Spectre v1 mitigation that
++               * happens to produce the required code on x86 for v4 as well.
++               */
++#ifdef CONFIG_X86
++              barrier_nospec();
++#endif
++              CONT;
+ #define LDST(SIZEOP, SIZE)                                            \
+       STX_MEM_##SIZEOP:                                               \
+               *(SIZE *)(unsigned long) (DST + insn->off) = SRC;       \
+diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
+index b44d8c447afd..ff1dd7d45b58 100644
+--- a/kernel/bpf/disasm.c
++++ b/kernel/bpf/disasm.c
+@@ -162,15 +162,17 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
+               else
+                       verbose(cbs->private_data, "BUG_%02x\n", insn->code);
+       } else if (class == BPF_ST) {
+-              if (BPF_MODE(insn->code) != BPF_MEM) {
++              if (BPF_MODE(insn->code) == BPF_MEM) {
++                      verbose(cbs->private_data, "(%02x) *(%s *)(r%d %+d) = 
%d\n",
++                              insn->code,
++                              bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
++                              insn->dst_reg,
++                              insn->off, insn->imm);
++              } else if (BPF_MODE(insn->code) == 0xc0 /* BPF_NOSPEC, no UAPI 
*/) {
++                      verbose(cbs->private_data, "(%02x) nospec\n", 
insn->code);
++              } else {
+                       verbose(cbs->private_data, "BUG_st_%02x\n", insn->code);
+-                      return;
+               }
+-              verbose(cbs->private_data, "(%02x) *(%s *)(r%d %+d) = %d\n",
+-                      insn->code,
+-                      bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+-                      insn->dst_reg,
+-                      insn->off, insn->imm);
+       } else if (class == BPF_LDX) {
+               if (BPF_MODE(insn->code) != BPF_MEM) {
+                       verbose(cbs->private_data, "BUG_ldx_%02x\n", 
insn->code);
+-- 
+2.30.2
+
diff -Nru 
linux-5.10.46/debian/patches/bugfix/all/bpf-remove-superfluous-aux-sanitation-on-subprog-rejection.patch
 
linux-5.10.46/debian/patches/bugfix/all/bpf-remove-superfluous-aux-sanitation-on-subprog-rejection.patch
--- 
linux-5.10.46/debian/patches/bugfix/all/bpf-remove-superfluous-aux-sanitation-on-subprog-rejection.patch
    1970-01-01 01:00:00.000000000 +0100
+++ 
linux-5.10.46/debian/patches/bugfix/all/bpf-remove-superfluous-aux-sanitation-on-subprog-rejection.patch
    2021-08-02 12:36:15.000000000 +0200
@@ -0,0 +1,79 @@
+From 59089a189e3adde4cf85f2ce479738d1ae4c514d Mon Sep 17 00:00:00 2001
+From: Daniel Borkmann <dan...@iogearbox.net>
+Date: Tue, 29 Jun 2021 09:39:15 +0000
+Subject: bpf: Remove superfluous aux sanitation on subprog rejection
+
+From: Daniel Borkmann <dan...@iogearbox.net>
+
+commit 59089a189e3adde4cf85f2ce479738d1ae4c514d upstream.
+
+Follow-up to fe9a5ca7e370 ("bpf: Do not mark insn as seen under speculative
+path verification"). The sanitize_insn_aux_data() helper does not serve a
+particular purpose in today's code. The original intention for the helper
+was that if function-by-function verification fails, a given program would
+be cleared from temporary insn_aux_data[], and then its verification would
+be re-attempted in the context of the main program a second time.
+
+However, a failure in do_check_subprogs() will skip do_check_main() and
+propagate the error to the user instead, thus such situation can never occur.
+Given its interaction is not compatible to the Spectre v1 mitigation (due to
+comparing aux->seen with env->pass_cnt), just remove sanitize_insn_aux_data()
+to avoid future bugs in this area.
+
+Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
+Acked-by: Alexei Starovoitov <a...@kernel.org>
+Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
+---
+ kernel/bpf/verifier.c |   34 ----------------------------------
+ 1 file changed, 34 deletions(-)
+
+--- a/kernel/bpf/verifier.c
++++ b/kernel/bpf/verifier.c
+@@ -11707,37 +11707,6 @@ static void free_states(struct bpf_verif
+       }
+ }
+ 
+-/* The verifier is using insn_aux_data[] to store temporary data during
+- * verification and to store information for passes that run after the
+- * verification like dead code sanitization. do_check_common() for subprogram 
N
+- * may analyze many other subprograms. sanitize_insn_aux_data() clears all
+- * temporary data after do_check_common() finds that subprogram N cannot be
+- * verified independently. pass_cnt counts the number of times
+- * do_check_common() was run and insn->aux->seen tells the pass number
+- * insn_aux_data was touched. These variables are compared to clear temporary
+- * data from failed pass. For testing and experiments do_check_common() can be
+- * run multiple times even when prior attempt to verify is unsuccessful.
+- *
+- * Note that special handling is needed on !env->bypass_spec_v1 if this is
+- * ever called outside of error path with subsequent program rejection.
+- */
+-static void sanitize_insn_aux_data(struct bpf_verifier_env *env)
+-{
+-      struct bpf_insn *insn = env->prog->insnsi;
+-      struct bpf_insn_aux_data *aux;
+-      int i, class;
+-
+-      for (i = 0; i < env->prog->len; i++) {
+-              class = BPF_CLASS(insn[i].code);
+-              if (class != BPF_LDX && class != BPF_STX)
+-                      continue;
+-              aux = &env->insn_aux_data[i];
+-              if (aux->seen != env->pass_cnt)
+-                      continue;
+-              memset(aux, 0, offsetof(typeof(*aux), orig_idx));
+-      }
+-}
+-
+ static int do_check_common(struct bpf_verifier_env *env, int subprog)
+ {
+       bool pop_log = !(env->log.level & BPF_LOG_LEVEL2);
+@@ -11807,9 +11776,6 @@ out:
+       if (!ret && pop_log)
+               bpf_vlog_reset(&env->log, 0);
+       free_states(env);
+-      if (ret)
+-              /* clean aux data in case subprog was rejected */
+-              sanitize_insn_aux_data(env);
+       return ret;
+ }
+ 
diff -Nru linux-5.10.46/debian/patches/series 
linux-5.10.46/debian/patches/series
--- linux-5.10.46/debian/patches/series 2021-07-28 06:28:42.000000000 +0200
+++ linux-5.10.46/debian/patches/series 2021-08-02 12:36:15.000000000 +0200
@@ -129,6 +129,10 @@
 bugfix/all/sctp-validate-from_addr_param-return.patch
 bugfix/all/sctp-add-size-validation-when-walking-chunks.patch
 bugfix/all/sctp-fix-return-value-check-in-__sctp_rcv_asconf_loo.patch
+bugfix/all/bpf-introduce-bpf-nospec-instruction-for-mitigating-.patch
+bugfix/all/bpf-fix-leakage-due-to-insufficient-speculative-stor.patch
+bugfix/all/bpf-remove-superfluous-aux-sanitation-on-subprog-rejection.patch
+bugfix/all/bpf-Add-kconfig-knob-for-disabling-unpriv-bpf-by-def.patch
 
 # Fix exported symbol versions
 bugfix/all/module-disable-matching-missing-version-crc.patch

Reply via email to