------- Comment From klaus.theur...@de.ibm.com 2020-09-29 03:27 EDT-------
(In reply to comment #166)
> I took this from "linux-next" (where it was tagged with 'next-20200923') as:
> ~/linux-next$ git show a02b55ea66b9
> commit a02b55ea66b9257744528da609a26279152a3bc3
> Author: Vasily Gorbik <g...@linux.ibm.com>
> Date:   Wed Sep 23 09:49:28 2020 +1000
>
> mm/gup: fix gup_fast with dynamic page table folding
>
> Currently to make sure that every page table entry is read just once
> gup_fast walks perform READ_ONCE and pass pXd value down to the next
> gup_pXd_range function by value e.g.:
>
> static int gup_pud_range(p4d_t p4d, unsigned long addr, unsigned long end,
> unsigned int flags, struct page **pages, int *nr)
> ...
>
> and built a patched groovy and a patched focal kernel, available here:
> https://people.canonical.com/~fheimes/lp1896726/
>
> Do you have a chance giving these a try?

I'll run a test with my MongoDB setup.
Can you provide the debug symbol package for the kernel as well?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1896726

Title:
  [UBUNTU 20.04.1] qemu (secure guest) crash due to gup_fast / dynamic
  page table folding issue

Status in Ubuntu on IBM z Systems:
  In Progress
Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  Incomplete
Status in linux source package in Groovy:
  In Progress

Bug description:
  Justification:
  ==============

  Secure KVM guest (using secure execution on Ubuntu Server 20.04 for s390x)
  crashes happen from time to time during boot.
  Such crashed guests ("reason=crashed" in the libvirt log) end up in hutoff 
state instead of crashed state (<on_crash> preserve is set).
  The crash points to a kernel memory management problem, addressed by the 
following patch/fix.
  The modifications touch common memory management code,
  but it will have no effect to architectures other than s390x.
  This is ensured by the fact that only s390 provides / implements the new 
helper functions.
  And for s390x, this is actually a critical (and carefully tested) fix for a 
(previous) regression, so it can hardly get any more regressive.
  The patch landed upstream in linux-next, is in depth discussed
  at LKML https://lkml.kernel.org/r/20190418100218.0a4afd51@mschwideX1
  and here 
https://lore.kernel.org/linux-arch/patch.git-943f1e5dcff2.your-ad-here.call-01599856292-ext-8676@work.hours/
  and will soon land via the regular upstream stable release update for kernel 
5.4 in focal, too.
  The process already started:
  
https://lore.kernel.org/stable/patch-1.thread-41918b.git-41918be365c0.your-ad-here.call-01600439945-ext-8991@work.hours/

  Hence this cherry-pick from the upstream patch should be added to groovy
  to avoid any potential regression in case the patch landed in focal via the 
upstream release update process,
  but is not in groovy and someones upgrades from focal to groovy.

  __________

  Secure Execution with Ubuntu 20.04, secure guest crash during boot
  from time to time, crashed guest went into Shufoff state instead of
  Crashed state (<on_crash>preserve is set), so I can't get a dump.

  libvirt log file:
  2020-04-21T16:35:39.382999Z qemu-system-s390x: Guest says index 19608 is 
available
  2020-04-21 16:35:44.831+0000: shutting down, reason=crashed

  ---uname output---
  Linux ubu204uclg1002 5.4.0-25-generic #29-Ubuntu SMP Fri Apr 17 15:05:32 UTC 
2020 s390x s390x s390x GNU/Linux

  Machine Type = z15 8561

  ---Debugger---
  A debugger is not configured

  ---Steps to Reproduce---
   I have a setup with 72 KVM guests which I can start in secure or non-secure 
mode. Starting all of them in secure mode back to back results in a number of 
guests (4..8) in Shutoff state and reason=crashed in the libvirt log. I can 
manually start the guest again.... no problem. Different guests are failing.
  Host and guests are on latest Ubuntu 20.04.

  The supposed fix (kernel memory management) has landed in Andrew Mortons mm
  tree
  
https://lore.kernel.org/mm-commits/20200916003608.ib4ln%25a...@linux-foundation.org/T/#u

  Please note: while this was found with secure execution, the bug is
  actually present for non-KVM workloads as well.

  The complete patch is this:
  
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=a338e69ba37286c0fc300ab7e6fa0227e6ca68b1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1896726/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to