------- Comment From jldo...@us.ibm.com 2024-03-11 14:00 EDT-------
Thank you Frank for providing a test kernel! I have installed the test kernel 
and everything appears to be working smoothly.

---
root@neop91:/home/neo# mkvterm --id 2
vterm for partition 2 is active.  Press Control+] to exit.
IBM Virtual I/O Server

login:
Cleaning up...
vterm closed
---

syslog:

---
Mar 11 17:52:13 neop91 kernel: [   50.767514] HVCS: Driver registered.
Mar 11 17:52:13 neop91 drmgr: drmgr: -c slot -s U9080.M9S.13073A8-V1-C6 -a -w 3
Mar 11 17:52:13 neop91 kernel: [   50.853870] rpaphp: RPA HOT Plug PCI 
Controller Driver version: 0.1
Mar 11 17:52:13 neop91 kernel: [   51.030821] HVCS: vty-server@30000006 added 
to the vio bus.
Mar 11 17:52:13 neop91 kernel: [   51.030860] rpadlpar_io: slot 
U9080.M9S.13073A8-V1-C6 added
Mar 11 17:52:16 neop91 kernel: [   53.899124] HVCS: vty-server@30000006 
connection opened.
Mar 11 17:52:23 neop91 kernel: [   61.344106] HVCS: Closed vty-server@30000006 
and partner vty@30000000:2 connection.
Mar 11 17:52:24 neop91 drmgr: drmgr: -c slot -s U9080.M9S.13073A8-V1-C6 -r -w 3
Mar 11 17:52:24 neop91 kernel: [   61.468923] HVCS: Destroyed hvcs_struct for 
vty-server@30000006.
Mar 11 17:52:24 neop91 kernel: [   61.468925] HVCS: vty-server@30000006 removed 
from the vio bus.
Mar 11 17:52:24 neop91 kernel: [   61.468971] rpadlpar_io: slot 
U9080.M9S.13073A8-V1-C6 removed
---

HVCS is being probed and rpadlpar_io is adding the slot information
correctly. When mkvterm closes the connection with Ctrl+], HVCS is
closing the connection properly and behaving as expected.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2056373

Title:
  Problems with HVCS and hotplugging

Status in The Ubuntu-power-systems project:
  In Progress
Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Focal:
  In Progress
Status in linux source package in Jammy:
  In Progress
Status in linux source package in Mantic:
  Invalid
Status in linux source package in Noble:
  Invalid

Bug description:
  SRU Justification:
  ==================

  [Impact]

   * HVCS (Hypervisor Virtual Console Server) is broken because the
     virtual terminal mkvterm fails, caused by pvmutil failing.

   * When mkvterm is ran, it ultimately fails because it calls pvmutil
     which fails.
     pvmutil calls drmgr, and drmgr is adding a slot correctly.
     However, when drmgr writes the slot information to ?/add_slot,
     the return is -ENODEV.

   * This leads to HVCS never having probe() called.

   * In addition, HVCS is missing patches/fixes, and is broken without
  them.

  [Fix]

   * Fix one and two is required for focal only, all other for focal and
  jammy:

   * 57409d4fb12c 57409d4fb12c185b2c0689e0496878c8f6bb5b58
     "powerpc/pseries: Fix bad drc_index_start value parsing of drc-info entry"

   * c5e76fa05b2d c5e76fa05b2df519b9f08571cc57e623c1569faa
     "powerpc/pseries: Fix of_read_drc_info_cell() to point at next record"

   * 6a9a733edd46 6a9a733edd46732e906d976dc21a42dd361e53cc
     "hvcs: Fix hvcs port reference counting"

   * 760aa5e81f33 760aa5e81f33e0da82512c4288489739a6d1c556
     "hvcs: Use dev_groups to manage hvcs device attributes"

   * 503a90dd619d 503a90dd619d52dcac2cc68bd742aa914c7cd47a
     "hvcs: Use driver groups to manage driver attributes"

   * 3a8d3b366ce4 3a8d3b366ce47024bf274eac783f8af5df2780f5
     "hvcs: Get reference to tty in remove"

   * d432228bc7b1 d432228bc7b1b3f0ed06510278ff5a77b3749fe6
     "hvcs: Use vhangup in hotplug remove"

   * 28d49f8cbe9c 28d49f8cbe9c7966f91ee1b5ec2f997f6e55bf9f
     "hvcs: Synchronize hotplug remove with port free"

  [Test Plan]

   * The high level test plan is to run mkvterm with an id.
   
   * mkvterm will fail because /dev/hvcs* device nodes are missing.

   * Details see https://bugs.launchpad.net/bugs/2023243 for more information.
     Especially the script provided by IBM
     (see original bug description: `---Steps to Reproduce---`).

   * IBM will (stress) test the updated kernel(s) provided in -proposed.

  [Where problems could occur]

   * The first two commits affect arch/powerpc/platforms/pseries/of_helpers.c
     and are needed to fix the hotplugging issue seen when drmgr goes to write
     the slot information to /sys/bus/pci/slots/control/add_slot.
     In case of issues here hotplugging with drmgr might break.

   * The issue lies in rpadlpar_io and rpaphp calling an of helper function
     of_read_drc_info_cell(). Without these commits, the value stored
     drc_index_start is incorrect.
     This ultimately results in the entire SLOT string being incorrect,
     and rpaphp never finding the newly added slot by drmgr.
     rpadlpar then returns -ENODEV.
     Therefore, HVCS is never probed, and the device nodes are never created.

   * HVCS, rpadlpar_io, and rpaphp should ideally not even need to be loaded
     prior to drmgr adding a vio slot.
     If rpadlpar_io and rpaphp are not loaded, drmgr will load them.
     In addition, if rpadlpar_io and rpaphp register the new slot correctly,
     rpadlpar_io will call dlpar_add_vio_slot(),
     which calls vio_register_device_node() with the device node.
     This is what tells the driver core to init and probe HVCS
     (which is needed to create the device nodes).

   * The remaning 6 commits are needed for HVCS, that is essentially
     broken without them.
     Overall, issues they fix are race conditions, hotplug remove issues,
     as well as memory leaks.

   * Please notice that this is entirely ppc64el architecture-specifc.

  [Other Info]

   * All the commits listed above are included in mantic and noble.
     Hence these are set to Invalid.

   * Meanwhile these requested commits have been added to other
     kernels and distros.
  __________

  ---Problem Description---
  Issues with HVCS and hotplugging issues.

  When working on Canonical bug 2023243, it was discovered that mkvterm
  was not working for multiple reasons. This bug will cover the issues
  found in HVCS, and hotplugging issues found when drmgr writes the slot
  information to .../add_slot.

  When mkvterm is ran, it ultimately fails because it calls pvmutil
  which fails. pvmutil calls drmgr, and drmgr is adding a slot
  correctly. However, when drmgr writes the slot information to
  ?/add_slot, the return is -ENODEV. This leads to HVCS never having
  probe() called. In addition, HVCS is missing patches, and is broken
  without them. 8 kernel patches have been identified to fix these
  issues.

  ---uname output---
  Linux neop91.pok.stglabs.ibm.com 5.4.0-173-generic #191-Ubuntu SMP Fri Feb 2 
13:54:35 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux

  ---Steps to Reproduce---
   Run mkvterm with an id. mkvterm will fail because /dev/hvcs* device nodes 
are missing. See 
https://bugs.launchpad.net/ubuntu/+source/powerpc-utils/+bug/2023243 for more 
information.

  ------------------------

  2 commits made to arch/powerpc/platforms/pseries/of_helpers.c are
  needed. These commits fix the hotplugging issue seen when drmgr goes
  to write the slot information to /sys/bus/pci/slots/control/add_slot.
  This is also why the HVCS device nodes were not being created, as
  mentioned in the previous bug.

  The issue lies in rpadlpar_io and rpaphp calling an of helper function
  of_read_drc_info_cell(). Without these commits, the value stored
  drc_index_start is incorrect. This ultimately results in the entire
  SLOT string being incorrect, and rpaphp never finding the newly added
  slot by drmgr. rpadlpar then returns -ENODEV. Therefore, HVCS is never
  probed, and the device nodes are never created.

  Ideally - HVCS, rpadlpar_io, and rpaphp should not even need to be
  loaded prior to drmgr adding a vio slot. If rpadlpar_io and rpaphp are
  not loaded, drmgr will load them. In addition, if rpadlpar_io and
  rpaphp register the new slot correctly, rpadlpar_io will call
  dlpar_add_vio_slot(), which calls vio_register_device_node() with the
  device node. This is what tells the driver core to init and probe HVCS
  (which is needed to create the device nodes).

  In addition to the 2 commits mentioned above, 6 HVCS commits are
  needed. HVCS is essentially broken without them. Issues include race
  conditions, hotplug remove issues, as well as memory leaks. These
  commits have been added to other distros after multiple issues were
  seen. Without these commits, 20.04 will experience the same issues.
  IBM plans on stress testing these changes after an updated kernel is
  provided in focal-proposed.

  -------

  2 commits that make changes to
  arch/powerpc/platforms/pseries/of_helpers.c:

  powerpc/pseries: Fix bad drc_index_start value parsing of drc-info entry
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=57409d4fb12c185b2c0689e0496878c8f6bb5b58

  powerpc/pseries: Fix of_read_drc_info_cell() to point at next record
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c5e76fa05b2df519b9f08571cc57e623c1569faa

  HVCS commits:

  hvcs: Fix hvcs port reference counting
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6a9a733edd46732e906d976dc21a42dd361e53cc

  hvcs: Use dev_groups to manage hvcs device attributes
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=760aa5e81f33e0da82512c4288489739a6d1c556

  hvcs: Use driver groups to manage driver attributes
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=503a90dd619d52dcac2cc68bd742aa914c7cd47a

  hvcs: Get reference to tty in remove
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3a8d3b366ce47024bf274eac783f8af5df2780f5

  hvcs: Use vhangup in hotplug remove
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d432228bc7b1b3f0ed06510278ff5a77b3749fe6

  hvcs: Synchronize hotplug remove with port free
  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28d49f8cbe9c7966f91ee1b5ec2f997f6e55bf9f

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2056373/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to