------- Comment From cborn...@de.ibm.com 2022-07-15 10:04 EDT-------
(In reply to comment #15)
> This bug is awaiting verification that the linux/5.15.0-43.46 kernel in
> -proposed solves the problem. Please test the kernel and update this bug
> with the results. If the problem is solved, change the tag
> 'verification-needed-jammy' to 'verification-done-jammy'. If the problem
> still exists, change the tag 'verification-needed-jammy' to
> 'verification-failed-jammy'.
>
> If verification is not done by 5 working days from today, this fix will be
> dropped from the source code, and this bug will be closed.
>
> See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to
> enable and use -proposed. Thank you!

Same testcase as on focal successful also on jammy.

** Tags removed: verification-needed-jammy
** Tags added: verification-done-jammy

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1979296

Title:
  [UBUNTU 20.04] Include patches to avoid self-detected stall with
  Secure Execution

Status in Ubuntu on IBM z Systems:
  Fix Committed
Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  Fix Committed
Status in linux source package in Jammy:
  Fix Committed

Bug description:
  SRU Justification:
  ==================

  [Impact]

   * On IBM Z secure execution environments under heavy load
     (means with over-committed resources - KVM guests)
     rcu_sched self-detected stalls can occur,
     which lead to LPAR crashes.

  [Fix]

   * 57c5df13eca4 57c5df13eca4017ed28f9375dc1d246ec0f54217 "KVM: s390:
  pv: add macros for UVC CC values"

   * 1e2aa46de526 1e2aa46de526a5adafe580bca4c25856bb06f09e "KVM: s390:
  pv: avoid stalls for kvm_s390_pv_init_vm"

   * f0a1a0615a6f f0a1a0615a6ff6d38af2c65a522698fb4bb85df6 "KVM: s390:
  pv: avoid stalls when making pages secure"

  [Test Plan]

   * An IBM z15 or LinuxONE III LPAR with FC 115 (secure execution)
     enabled is required.

   * Installation of Ubuntu Server 20.04 LTS (18.04 with hwe-5.4)
     or 22.04 LTS on top.

   * Install a kernel that incl. the above two patches/commits

   * Bring the system under high load with KVM guests.

   * Monitor dmesg for 'rcu_sched self-detected stalls' 
     and/or look for crashes.

   * Due to hardware requirements this test needs to be conducted by
  IBM.

  [Where problems could occur]

   * The definition from 57c5df13eca4 are missing in both jammy
     and focal, but shouldn't harm.

   * The change in 1e2aa46de526 only uses uv_call_sched instead
     of just uv_call, which should lead to a snappier system
     under high load, but may consume overall some more cycles.

   * With f0a1a0615a6f the uv_call_sched cannot simply replace 
     uv_call, due to locks being held.

   * Instead __uv_call is replacing uv_call, which does not loop.

   * But due to these changes of the (uv) calls,
     - in case erroneous - they may lead to wrong states,
     and even broken ultravisor calls
     and with that broken secure execution (SE).

   * As a side effect the uv might no longer loop over all pages,
     and in worst case leaving some unprotected.

   * All this is s390x-only functionality,
     that is only available on IBM z15 / LinuxONE III systems and newer,
     and only is the optional feature 'FC 115' in place,
     which is limited to 'secure-execution' workloads.

  [Other Info]

   * Patches are upstream accepted with kernel 5.16.

   * Commit 1e2aa46de526 is already included in jammy
     but 57c5df13eca4 and f0a1a0615a6f are missing.

   * Focal requires all 3 commits 57c5df13eca4, 1e2aa46de526 and
  f0a1a0615a6f.

   * Since impish is very close to it's EOL, it's not covered by this SRU.
  __________

  ---Problem Description---
  rcu_sched self-detected stall with Secure Execution

  When the system is busy and additional Secure Execution guests are started, 
the LPAR crashes.
  Christian Borntraeger looked at the stack trace and identified two commits 
which should fix the issue:

  1e2aa46de526a5adafe580bca4c25856bb06f09e
  and
  f0a1a0615a6ff6d38af2c65a522698fb4bb85df6

  Please include these two fixes into 20.04, and 18.04 HWE.

  Here the stack trace:

  [592792.725078] rcu: INFO: rcu_sched self-detected stall on CPU
  [592792.725089] rcu:  4-....: (2099 ticks this GP) 
idle=7d2/1/0x4000000000000002 softirq=3920041/3920042 fqs=984
  [592792.725133]     (t=2100 jiffies g=26268505 q=410280)
  [592792.725135] Task dump for CPU 4:
  [592792.725137] qemu-system-s39 R running task    0 2557923 1644255 0x06000004
  [592792.725139] Call Trace:
  [592792.725146] ([<000000566e2dcf52>] show_stack+0x7a/0xc0)
  [592792.725150] [<000000566dab696c>] sched_show_task.part.0+0xdc/0x100
  [592792.725151] [<000000566e2df248>] rcu_dump_cpu_stacks+0xc0/0x100
  [592792.725154] [<000000566db0510c>] rcu_sched_clock_irq+0x75c/0x980
  [592792.725156] [<000000566db1326c>] update_process_times+0x3c/0x80
  [592792.725160] [<000000566db24fea>] tick_sched_handle.isra.0+0x4a/0x70
  [592792.725161] [<000000566db2528e>] tick_sched_timer+0x5e/0xc0
  [592792.725163] [<000000566db14294>] __hrtimer_run_queues+0x114/0x2f0
  [592792.725165] [<000000566db14fdc>] hrtimer_interrupt+0x12c/0x2a0
  [592792.725167] [<000000566da14b6a>] do_IRQ+0xaa/0xb0
  [592792.725170] [<000000566e2eed08>] ext_int_handler+0x130/0x134
  [592792.725174] [<000000566da2bad8>] gmap_make_secure+0x1c8/0x340
  [592792.725175] ([<000000566da2b9fe>] gmap_make_secure+0xee/0x340)
  [592792.725180] [<000000566da6e796>] kvm_s390_pv_unpack+0xc6/0x2b0
  [592792.725183] [<000000566da535c0>] kvm_s390_handle_pv+0x390/0x580
  [592792.725184] [<000000566da55b30>] kvm_arch_vm_ioctl+0x250/0x9e0
  [592792.725187] [<000000566da44c26>] kvm_vm_ioctl+0x396/0x760
  [592792.725191] [<000000566dceb0b6>] do_vfs_ioctl+0x376/0x690
  [592792.725193] [<000000566dceb454>] ksys_ioctl+0x84/0xb0
  [592792.725194] [<000000566dceb4ea>] __s390x_sys_ioctl+0x2a/0x40
  [592792.725195] [<000000566e2ee6b2>] system_call+0x2a6/0x2c8

  Contact Information = stefan.am...@de.ibm.com,  cborn...@de.ibm.com

  ---uname output---
  5.4.0-90-generic #101-Ubuntu

  Machine Type = 8562 A00-GT2

  ---System Hang---
   LPAR crashed and needed to be re-booted

  ---Debugger---
   A debugger is not configured

  ---Steps to Reproduce---
   Cause high load. Then start Secure Execution enabled KVM guest

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1979296/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to