** Description changed:

- This bug is for tracking and submitting this commit [1] once it lands in
- v5.8-rcN.
+ [Impact]
  
- [1]
- 
https://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
+  * Users of the Linux kernel's crypto userspace API
+    reported BUG() / kernel NULL pointer dereference
+    errors after kernel upgrades.
+ 
+  * The stack trace signature is an accept() syscall
+    going through af_alg_accept() and hitting errors
+    usually in one of:
+    - apparmor_sk_clone_security()
+    - apparmor_sock_graft()
+    - release_sock()
+ 
+ [Fix]
+       
+  * This is a regression introduced by upstream commit
+    37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
+    in sk_destruct") which made its way through stable.
+    
+  * The offending patch allows the critical regions
+    of af_alg_accept() and af_alg_release_parent() to
+    run concurrently; now with the "right" events on 2
+    CPUs it might drop the non-atomic reference counter
+    of the alg_sock then the sock, thus release a sock
+    that is still in use.
+ 
+  * The fix is upstream commit 34c86f4c4a7b ("crypto:
+    af_alg - fix use-after-free in af_alg_accept() due
+    to bh_lock_sock()") [1]. It changes alg_sock's ref
+    counter to atomic, which addresses the root cause.
+    
+ [Test Case]
+ 
+  * There is a synthetic test case available, which
+    uses a kprobes kernel module to synchronize the
+    concurrent CPUs on the instructions responsible
+    for the problem; and a userspace part to run it.
+ 
+  * The organic reproducer is the Varnish Cache Plus
+    software with the Crypto vmod (which uses kernel
+    crypto userspace API) under long, very high load.
+    
+  * The patch has been verified on both reproducers
+    with the 4.15 and 5.7 kernels.
+    
+  * More tests performed with 'stress-ng --af-alg'
+    with 11 CPUs/hogs on Bionic/Disco/Eoan/Focal
+    (all on same version of stress-ng, V0.11.14)
+    No regressions observed from original kernel.
+    (the af-alg stressor can exercise almost all
+    kernel crypto modules shipped with the kernel;
+    so it checks more paths/crypto alg interfaces.)
+    
+ [Regression Potential]
+ 
+  * The fix patch does a fundamental change in how
+    alg_sock reference counters work, plus another
+    change to the 'nokey' counting. This of course
+    *has* a risk of regression.
+ 
+  * Regressions theoretically could manifest as use
+    after free errors (in case of undercounting) in
+    the af_alg functions or silent memory leaks (in
+    case of overcounting), but also other behaviors
+    since reference counting is key to many things.
+    
+  * FWIW, this patch has been written by the crypto
+    subsystem maintainer, who certainly knows a lot
+    of the normal and corner cases, thus giving the
+    patch more credit.
+    
+  * Testing with the organic reproducer ran as long
+    as 5 days, without issues, so it does look good.
+ 
+ [Other Info]
+ 
+  * [1] Patch: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
+  
+ [Stack Trace Examples]
+ 
+ Examples:
+ 
+     BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
+     ...
+     RIP: 0010:apparmor_sk_clone_security+0x26/0x70
+     ...
+     Call Trace:
+      security_sk_clone+0x33/0x50
+      af_alg_accept+0x81/0x1c0 [af_alg]
+      alg_accept+0x15/0x20 [af_alg]
+      SYSC_accept4+0xff/0x210
+      SyS_accept+0x10/0x20
+      do_syscall_64+0x73/0x130
+      entry_SYSCALL_64_after_hwframe+0x3d/0xa2
+ 
+     general protection fault: 0000 [#1] SMP PTI
+     ...
+     RIP: 0010:__release_sock+0x54/0xe0
+     ...
+     Call Trace:
+      release_sock+0x30/0xa0
+      af_alg_accept+0x122/0x1c0 [af_alg]
+      alg_accept+0x15/0x20 [af_alg]
+      SYSC_accept4+0xff/0x210
+      SyS_accept+0x10/0x20
+      do_syscall_64+0x73/0x130
+      entry_SYSCALL_64_after_hwframe+0x3d/0xa2

** Also affects: linux (Ubuntu Groovy)
   Importance: Medium
     Assignee: Mauricio Faria de Oliveira (mfo)
       Status: Confirmed

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1884766

Title:
  use-after-free in af_alg_accept() due to bh_lock_sock()

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  New
Status in linux source package in Bionic:
  New
Status in linux source package in Eoan:
  New
Status in linux source package in Focal:
  New
Status in linux source package in Groovy:
  Confirmed

Bug description:
  [Impact]

   * Users of the Linux kernel's crypto userspace API
     reported BUG() / kernel NULL pointer dereference
     errors after kernel upgrades.

   * The stack trace signature is an accept() syscall
     going through af_alg_accept() and hitting errors
     usually in one of:
     - apparmor_sk_clone_security()
     - apparmor_sock_graft()
     - release_sock()

  [Fix]
        
   * This is a regression introduced by upstream commit
     37f96694cf73 ("crypto: af_alg - Use bh_lock_sock
     in sk_destruct") which made its way through stable.
     
   * The offending patch allows the critical regions
     of af_alg_accept() and af_alg_release_parent() to
     run concurrently; now with the "right" events on 2
     CPUs it might drop the non-atomic reference counter
     of the alg_sock then the sock, thus release a sock
     that is still in use.

   * The fix is upstream commit 34c86f4c4a7b ("crypto:
     af_alg - fix use-after-free in af_alg_accept() due
     to bh_lock_sock()") [1]. It changes alg_sock's ref
     counter to atomic, which addresses the root cause.
     
  [Test Case]

   * There is a synthetic test case available, which
     uses a kprobes kernel module to synchronize the
     concurrent CPUs on the instructions responsible
     for the problem; and a userspace part to run it.

   * The organic reproducer is the Varnish Cache Plus
     software with the Crypto vmod (which uses kernel
     crypto userspace API) under long, very high load.
     
   * The patch has been verified on both reproducers
     with the 4.15 and 5.7 kernels.
     
   * More tests performed with 'stress-ng --af-alg'
     with 11 CPUs/hogs on Bionic/Disco/Eoan/Focal
     (all on same version of stress-ng, V0.11.14)
     No regressions observed from original kernel.
     (the af-alg stressor can exercise almost all
     kernel crypto modules shipped with the kernel;
     so it checks more paths/crypto alg interfaces.)
     
  [Regression Potential]

   * The fix patch does a fundamental change in how
     alg_sock reference counters work, plus another
     change to the 'nokey' counting. This of course
     *has* a risk of regression.

   * Regressions theoretically could manifest as use
     after free errors (in case of undercounting) in
     the af_alg functions or silent memory leaks (in
     case of overcounting), but also other behaviors
     since reference counting is key to many things.
     
   * FWIW, this patch has been written by the crypto
     subsystem maintainer, who certainly knows a lot
     of the normal and corner cases, thus giving the
     patch more credit.
     
   * Testing with the organic reproducer ran as long
     as 5 days, without issues, so it does look good.

  [Other Info]

   * [1] Patch: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34c86f4c4a7be3b3e35aa48bd18299d4c756064d
   
  [Stack Trace Examples]

  Examples:

      BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      ...
      RIP: 0010:apparmor_sk_clone_security+0x26/0x70
      ...
      Call Trace:
       security_sk_clone+0x33/0x50
       af_alg_accept+0x81/0x1c0 [af_alg]
       alg_accept+0x15/0x20 [af_alg]
       SYSC_accept4+0xff/0x210
       SyS_accept+0x10/0x20
       do_syscall_64+0x73/0x130
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2

      general protection fault: 0000 [#1] SMP PTI
      ...
      RIP: 0010:__release_sock+0x54/0xe0
      ...
      Call Trace:
       release_sock+0x30/0xa0
       af_alg_accept+0x122/0x1c0 [af_alg]
       alg_accept+0x15/0x20 [af_alg]
       SYSC_accept4+0xff/0x210
       SyS_accept+0x10/0x20
       do_syscall_64+0x73/0x130
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1884766/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to