** Also affects: ubuntu-power-systems
   Importance: Undecided
       Status: New

** Changed in: ubuntu-power-systems
   Importance: Undecided => High

** Changed in: ubuntu-power-systems
     Assignee: (unassigned) => Canonical Kernel Team (canonical-kernel-team)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1715064

Title:
  17.10 fails to boot on POWER9 DD2.0 with Deep stop states

Status in The Ubuntu-power-systems project:
  New
Status in linux package in Ubuntu:
  New

Bug description:
  == Comment: #0 - Ranjal G. Shenoy
  On Boston DD2.0 system, where deep stop states such as stop4 are enabled, the 
17.10 kernel Ubuntu-4.12.0-12.13 fails to boot.

  It requires the following upstream fixes to be backported.

  1) commit 5f221c3ca13d ("powerpc/powernv/idle: Correctly initialize 
core_idle_state_ptr")
  2) commit ec4867355244 ("powerpc/powernv/idle: Decouple Timebase restore & 
Per-core SPRs restore")
  3) commit cb0be7ec0307 ("powerpc/powernv/idle: Restore LPCR on wakeup from 
deep-stop")
  4) commit 1e1601b38e6e ("powerpc/powernv/idle: Restore SPRs for deep idle 
states via stop API.")
  5) commit 22c6663dc69a ("powerpc/powernv/idle: Use Requested Level for 
restoring state on P9 DD1")
  6) commit f9122ee4f558 ("cpuidle-powernv: Allow Deep stop states that don't 
stop time")
  7) commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT 
states when stop-api fails")
  8) commit e1c1cfed5432 ("powerpc/powernv: Save/Restore additional SPRs for 
stop4 cpuidle")
  9) commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api 
only on Hotplug")
  10) https://patchwork.ozlabs.org/patch/808233/ ("powerpc/powernv: Clear 
LPCR[PECE1] via stop-api only for deep state offline")

  Of these 1-7 are in Linux Kernel 4.13. 8 and 9 are in
  powerpc/linux.git -next branch. and 10) is posted upstream which fixes
  9).

  These patches have been backported on top of  Ubuntu-4.12.0-12.13 and
  tested on Boston where they are working as expected.

  == Comment: #1 - Ranjal G. Shenoy 
     
      The lower 8 bits of core_idle_state_ptr tracks the number of non-idle
      threads in the core. This is supposed to be initialized to bit-map
      corresponding to the threads_per_core. However, currently it is
      initialized to PNV_CORE_IDLE_THREAD_BITS (0xFF). This is correct for
      POWER8 which has 8 threads per core, but not for POWER9 which has 4
      threads per core.
      
      As a result, on POWER9, core_idle_state_ptr gets initialized to
      0xFF. In case when all the threads of the core are idle, the bits
      corresponding tracking the idle-threads are non-zero. As a result, the
      idle entry/exit code fails to save/restore per-core hypervisor state
      since it assumes that there are threads in the cores which are still
      active.
      
      Fix this by correctly initializing the lower bits of the
      core_idle_state_ptr on the basis of threads_per_core.
      
      Cherry-picked from commit 5f221c3ca13d ("powerpc/powernv/idle:
      Correctly initialize core_idle_state_ptr")

  == Comment: #2 - Ranjal G. Shenoy 
     On POWER8, in case of
         -  nap: both timebase and hypervisor state is retained.
         -  fast-sleep: timebase is lost. But the hypervisor state is retained.
         -  winkle: timebase and hypervisor state is lost.
      
      Hence, the current code for handling exit from a idle state assumes
      that if the timebase value is retained, then so is the hypervisor
      state. Thus, the current code doesn't restore per-core hypervisor
      state in such cases.
      
      But that is no longer the case on POWER9 where we do have stop states
      in which timebase value is retained, but the hypervisor state is
      lost. So we have to ensure that the per-core hypervisor state gets
      restored in such cases.
      
      Fix this by ensuring that even in the case when timebase is retained,
      we explicitly check if we are waking up from a deep stop that loses
      per-core hypervisor state (indicated by cr4 being eq or gt), and if
      this is the case, we restore the per-core hypervisor state.
      
      Cherry-picked from commit ec4867355244 ("powerpc/powernv/idle:
      Decouple Timebase restore & Per-core SPRs restore")

  == Comment: #3 - Ranjal G. Shenoy 
     On wakeup from a deep stop state which is supposed to lose the
      hypervisor state, we don't restore the LPCR to the old value but set
      it to a "sane" value via cur_cpu_spec->cpu_restore().
      
      The problem is that the "sane" value doesn't include UPRT and the HR
      bits which are required to run correctly in Radix mode.
      
      Fix this on POWER9 onwards by restoring the LPCR value whatever it was
      before executing the stop instruction.
      
      Cherry-picked from commit cb0be7ec0307 ("powerpc/powernv/idle: Restore
      LPCR on wakeup from deep-stop")

  == Comment: #4 - Ranjal G. Shenoy 
     Some of the SPR values (HID0, MSR, SPRG0) don't change during the run
      time of a booted kernel, once they have been initialized.
      
      The contents of these SPRs are lost when the CPUs enter deep stop
      states. So instead saving and restoring SPRs from the kernel, use the
      stop-api provided by the firmware by which the firmware can restore
      the contents of these SPRs to their initialized values after wakeup
      from a deep stop state.
      
      Apart from these, program the PSSCR value to that of the deepest stop
      state via the stop-api. This will be used to indicate to the
      underlying firmware as to what stop state to put the threads that have
      been woken up by a special-wakeup.
      
      And while we are at programming SPRs via stop-api, ensure that HID1,
      HID4 and HID5 registers which are only available on POWER8 are not
      requested to be restored by the firware on POWER9.
      
      Cherry-picked from commit 1e1601b38e6e ("powerpc/powernv/idle: Restore
      SPRs for deep idle states via stop API.")

  == Comment: #5 - Ranjal G. Shenoy 
     On Power9 DD1 due to a hardware bug the Power-Saving Level Status
      field (PLS) of the PSSCR for a thread waking up from a deep state can
      under-report if some other thread in the core is in a shallow stop
      state. The scenario in which this can manifest is as follows:
      
         1) All the threads of the core are in deep stop.
         2) One of the threads is woken up. The PLS for this thread will
            correctly reflect that it is waking up from deep stop.
         3) The thread that has woken up now executes a shallow stop.
         4) When some other thread in the core is woken, its PLS will reflect
            the shallow stop state.
      
      Thus, the subsequent thread for which the PLS is under-reporting the
      wakeup state will not restore the hypervisor resources.
      
      Hence, on DD1 systems, use the Requested Level (RL) field as a
      workaround to restore the contents of the hypervisor resources on the
      wakeup from the stop state.
      
      Cherry-picked from commit 22c6663dc69a ("powerpc/powernv/idle: Use
      Requested Level for restoring state on P9 DD1")

  == Comment: #6 - Ranjal G. Shenoy 
     The current code in the cpuidle-powernv intialization only allows deep
      stop states (indicated by OPAL_PM_STOP_INST_DEEP) which lose timebase
      (indicated by OPAL_PM_TIMEBASE_STOP). This assumption goes back to
      POWER8 time where deep states used to lose the timebase. However, on
      POWER9, we do have stop states that are deep (they lose hypervisor
      state) but retain the timebase.
      
      Fix the initialization code in the cpuidle-powernv driver to allow
      such deep states.
      
      Further, there is a bug in cpuidle-powernv driver with
      CONFIG_TICK_ONESHOT=n where we end up incrementing the nr_idle_states
      even if a platform idle state which loses time base was not added to
      the cpuidle table.
      
      Fix this by ensuring that the nr_idle_states variable gets incremented
      only when the platform idle state was added to the cpuidle table.
      
      Cherry-picked from commit f9122ee4f558 ("cpuidle-powernv: Allow Deep
      stop states that don't stop time")

  == Comment: #7 - Ranjal G. Shenoy 
     Currently, we use the opal call opal_slw_set_reg() to inform the
      Sleep-Winkle Engine (SLW) to restore the contents of some of the
      Hypervisor state on wakeup from deep idle states that lose full
      hypervisor context (characterized by the flag
      OPAL_PM_LOSE_FULL_CONTEXT).
      
      However, the current code has a bug in that if opal_slw_set_reg()
      fails, we don't disable the use of these deep states (winkle on
      POWER8, stop4 onwards on POWER9).
      
      This patch fixes this bug by ensuring that if programing the
      sleep-winkle engine to restore the hypervisor states in
      pnv_save_sprs_for_deep_states() fails, then we exclude such states by
      clearing the OPAL_PM_LOSE_FULL_CONTEXT flag from
      supported_cpuidle_states. As a result POWER8 will be prevented from
      using winkle for CPU-Hotplug, and POWER9 will put the offlined CPUs to
      the default stop state when available.
      
      Further, we ensure in the initialization of the cpuidle-powernv driver
      to only include those states whose flags are present in
      supported_cpuidle_states, thereby skipping OPAL_PM_LOSE_FULL_CONTEXT
      states when they have been disabled due to stop-api failure.
      
      Fixes: 1e1601b38e6 ("powerpc/powernv/idle: Restore SPRs for deep idle
      states via stop API.")
      
      Cherry-picked from commit 785a12afdb4a ("powerpc/powernv/idle: Disable
      LOSE_FULL_CONTEXT states when stop-api fails")

  == Comment: #8 - Ranjal G. Shenoy 
     The stop4 idle state on POWER9 is a deep idle state which loses
      hypervisor resources, but whose latency is low enough that it can be
      exposed via cpuidle.
      
      Until now, the deep idle states which lose hypervisor resources (eg:
      winkle) were only exposed via CPU-Hotplug.  Hence currently on wakeup
      from such states, barring a few SPRs which need to be restored to
      their older value, rest of the SPRS are reinitialized to their values
      corresponding to that at boot time.
      
      When stop4 is used in the context of cpuidle, we want these additional
      SPRs to be restored to their older value, to ensure that the context
      on the CPU coming back from idle is same as it was before going idle.
      
      In this patch, we define a SPR save area in PACA (since we have used
      up the volatile register space in the stack) and on POWER9, we restore
      SPRN_PID, SPRN_LDBAR, SPRN_FSCR, SPRN_HFSCR, SPRN_MMCRA, SPRN_MMCR1,
      SPRN_MMCR2 to the values they had before entering stop.
      
      Cherry-picked from commit e1c1cfed5432 ("powerpc/powernv: Save/Restore
      additional SPRs for stop4 cpuidle")

  == Comment: #9 - Ranjal G. Shenoy 
     Currently we use the stop-api provided by the firmware to program the
      SLW engine to restore the values of hypervisor resources that get lost
      on deeper idle states (such as winkle). Since the deep states were
      only used for CPU-Hotplug on POWER8 systems, we would program the LPCR
      to have the PECE1 bit since Hotplugged CPUs shouldn't be spuriously
      woken up by decrementer.
      
      On POWER9, some of the deep platform idle states such as stop4 can be
      used in cpuidle as well. In this case, we want the CPU in stop4 to be
      woken up by the decrementer when some timer on the CPU expires.
      
      In this patch, we program the stop-api for LPCR with PECE1
      bit cleared only when we are offlining the CPU and set it
      back once the CPU is online.
      
      Cherry-picked from commit 24be85a23d1f ("powerpc/powernv: Clear PECE1
      in LPCR via stop-api only on Hotplug")

  == Comment: #10 - Ranjal G. Shenoy 
     commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
      stop-api only on Hotplug") clears the PECE1 bit of the LPCR via
      stop-api during CPU-Hotplug to prevent wakeup due to a decrementer on
      an offlined CPU which is in a deep stop state.
      
      In the case where the stop-api support is found to be lacking, the
      commit 785a12afdb4a ("powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT
      states when stop-api fails") disables deep states that lose hypervisor
      context. Thus in this case, the offlined CPU will be put to some
      shallow idle state.
      
      However, we currently unconditionally clear the PECE1 in LPCR via
      stop-api during CPU-Hotplug even when deep states are disabled due to
      stop-api failure.
      
      Fix this by clearing PECE1 of LPCR via stop-api during CPU-Hotplug
      *only* when the offlined CPU will be put to a deep state that loses
      hypervisor context.
      
      Fixes: commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via
      stop-api only on Hotplug")

      upstream reference: https://patchwork.ozlabs.org/patch/808233/

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/1715064/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to