** Changed in: linux (Ubuntu Kinetic)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Kinetic)
       Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2018470

Title:
  Linux 5.19 amdgpu: NULL pointer on GCN2 and invalid load on GCN1

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Kinetic:
  Fix Committed
Status in linux source package in Mantic:
  Confirmed

Bug description:
  [Impact]
  A regression caused by incomplete stable backports

  [Fix]

  commit 8273b4048664fff356fd10059033f0e2f5a422a1
  Author: Arunpravin Paneer Selvam <arunpravin.paneersel...@amd.com>
  Date:   Tue Oct 18 07:08:38 2022 -0700

      drm/amdgpu: Fix for BO move issue

  [Test case]

  Install the update, check that display works again on amdgpu

  --

  The day I updated from Ubuntu 22.04 to 22.10 some months ago, I had to
  stick on Linux 5.15 because 5.19 was not working with my computer. The
  last two days I spent time to find a way to run Linux 5.19, and found
  one version working: 5.19.0-23.

  Here are the versions I tested:

  - 5.19.0-23
  - 5.19.0-29
  - 5.19.0-31
  - 5.19.0-42

  In that list, only Linux 5.19.0-23 is working with that computer.

  There may be other versions that work I have not tested, but basically
  the breakages occurred after 5.19.0-23.

  I face two problems, let's talk about the first one, the graphic one
  still present in 5.19.0-42. It starts to occurs with 5.19.0-31
  (5.19.0-29 is not affected): graphic breaks at the moment it should
  switch from low resolution display to high resolution display at the
  very beginning of startup. The computer is not completely broken, but
  the graphic is dead. X11 cannot start, trying to use the framebuffer,
  meaning the amdgpu driver is not functional).

  The second bug is the one I get with the 5.19.0-29 version. Linux
  5.19.0-29 doesn't experience the graphic bug but has another issue
  that makes the computer unusable: some CPU got locked, and some btrfs
  process runs at 100% CPU, syncing never ends, even preventing to
  reboot. This bug is less important because I don't reproduce it on
  version 5.19.0-42, so if 5.19.0-42 fixes the graphic all will be fine.

  I have not updated to Ubuntu 23.04 yet because I'm afraid of newer
  kernels from it would leave my computer totally unusable, I have run
  Ubuntu 22.10 with Ubuntu 22.04's 5.15 kernel until today because of
  that fear.

  It actually took me two work days to test various combinations to boot
  the computer so I'm sticking on 5.19.0-29 for now, and I have limited
  time to test other options. I also tried various BIOS options, and
  also upgraded the BIOS…, and since that ThreadRipper PRO computer has
  very slow booting BIOS, trying various configurations or software
  versions that requires a reboot quickly eats-up whole hours.

  The attached logs may have traces of dkim modules like amdgpu-pro, but
  the first time I experienced the bug I had none of them. I reproduced
  the bug on a 5.19.0-42 kernel free of amdgpu-pro yesterday. I'm simply
  opening the ticket from my working environment, and I decided to not
  spend one more hour just to uninstall amdgpu-pro and reboot only to do
  that ticket.

  Here are some details on the hardware:

  - MOBO: Gigabyte WRX80-SU8-IPMI rev. 1.0 (BIOS version F5, also named 
WRX80PRO-F1 in dmidecode, dated 08/04/2022) 
https://www.gigabyte.com/Motherboard/WRX80-SU8-IPMI-rev-10
  - RAM: 8× Kingston Server Premier 32GB DDR4 3200 MHz ECC CL22 2Rx8 PC4-25600 
KSM32ED8/32ME 16Gbit Micron E
  - CPU: AMD Ryzen Threadripper PRO 3955WX 16-Cores (Castle Peak, Zen 2)
  - GPU: AMD Radeon R9 390X (Hawaii/Grenada, GCN2, amdgpu driver)
  - GPU: AMD Radeon R7 240 (Oland, GCN1, amdgpu driver)
  - GPU: ASPEED graphic Family rev 41

  The ASPEED graphic is a small card integrated in the motherboard and
  part of the BMC, I cannot remove it. This may participate in the
  trouble.

  When the graphic works (Linux 5.19.0-23, Linux 5.19.0-29), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the ASPEED
  graphic goes off and the display continue on AMD cards.

  When the graphic doesn't work (5.19.0-31, 5.19.0-42), the boot is
  displayed on all AMD and ASPEED graphic output, then at the moment the
  graphic switches from low resolution to high resolution, the AMD cards
  display garbage but the display continue on the ASPEED card. The
  ASPEED card is a very basic integrated card without hardware
  acceleration and featuring only one VGA output so that's unusable. As
  an additional information I know X11 never start on the ASPEED if
  there are discrete cards plugged in (tested last year).

  So right now that computer is sticking on Linux 5.19.0-23 which
  doesn't doesn't the graphic and btrfs bugs.

  The last kernel to not feature the graphic bug is Linux 5.19.0-29.
  Linux 5.19.0-31 is the first one reproducing the graphic bug (the
  repository doesn't provide 5.19.0-30 for me to test).

  I also have reproduced the graphic bug when using the radeon driver
  instead of the amdgpu one.

  ProblemType: Bug
  DistroRelease: Ubuntu 22.10
  Package: linux-image-generic 5.19.0.42.38
  ProcVersionSignature: Ubuntu 5.19.0-23.24-generic 5.19.7
  Uname: Linux 5.19.0-23-generic x86_64
  ApportVersion: 2.23.1-0ubuntu3.3
  Architecture: amd64
  CasperMD5CheckResult: unknown
  CurrentDesktop: GNOME
  Date: Thu May  4 11:52:02 2023
  HibernationDevice: RESUME=none
  MachineType: Default string Default string
  ProcEnviron:
   LANGUAGE=fr_FR:en
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=fr_FR.UTF-8
   SHELL=/bin/bash
  ProcFB:
   0 astdrmfb
   1 amdgpudrmfb
   2 amdgpudrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-5.19.0-23-generic 
root=UUID=f35ecf77-511e-4dde-ac11-c1d848e97315 ro rootflags=subvol=@ 
amdgpu.si_support=1 radeon.si_support=0 amdgpu.cik_support=1 
radeon.cik_support=0 amdgpu.exp_hw_support=1 amdgpu.gpu_recovery=1 
amdgpu.ppfeaturemask=0xffffffff delayacct zswap.enabled=1
  PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No 
PulseAudio daemon running, or not running as session daemon.
  RelatedPackageVersions:
   linux-restricted-modules-5.19.0-23-generic N/A
   linux-backports-modules-5.19.0-23-generic  N/A
   linux-firmware                             20220923.gitf09bebf3-0ubuntu1.6
  RfKill:

  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/04/2022
  dmi.bios.release: 5.23
  dmi.bios.vendor: American Megatrends International, LLC.
  dmi.bios.version: WRX80PRO-F1
  dmi.board.asset.tag: Default string
  dmi.board.name: Default string
  dmi.board.vendor: Default string
  dmi.board.version: Default string
  dmi.chassis.asset.tag: Default string
  dmi.chassis.type: 3
  dmi.chassis.vendor: Default string
  dmi.chassis.version: Default string
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInternational,LLC.:bvrWRX80PRO-F1:bd08/04/2022:br5.23:svnDefaultstring:pnDefaultstring:pvrDefaultstring:rvnDefaultstring:rnDefaultstring:rvrDefaultstring:cvnDefaultstring:ct3:cvrDefaultstring:skuDefaultstring:
  dmi.product.family: Default string
  dmi.product.name: Default string
  dmi.product.sku: Default string
  dmi.product.version: Default string
  dmi.sys.vendor: Default string
  modified.conffile..etc.default.apport: [modified]
  mtime.conffile..etc.default.apport: 2018-06-16T17:39:00.798346

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2018470/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to