I decided to disconnect the Nvidia GPU and use a spare AMD GPU and haven't had this occur since. I guess you're probably right that it's a hardware issue but don't know whether the fault was with the GPU or the Motherboard. At the time this was happening I had the Nvidia GPU in the first slot and the AMD GPU in the second PCIe slot but without any power cables running to it from the PSU, so essentially it was off and not being picked up by Ubuntu.
I don't know if the motherboard would be at fault or not in the scenario above if the second slot has a device plugged in but not powered? Initially it looks like the fault of the Nvidia GPU but I haven't tested it in other configurations to definitively say it's a fault with the Nvidia GPU. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/2023585 Title: [nvidia] GPU has fallen off the bus Status in linux package in Ubuntu: Incomplete Status in nvidia-graphics-drivers-525 package in Ubuntu: New Bug description: When playing Assassins Creed Unity through Steam, the game will run fine for a short period and then pretty quickly in my experience the screen will go blank, lights on the GPU will turn off and GPU fans will spin at max RPM. I checked the dmesg logs from that session and saw at the bottom: ``` Jun 12 19:25:09 pikachu kernel: NVRM: GPU at PCI:0000:0b:00: GPU-f888943b-327b-82af-03dd-7c4213dc4788 Jun 12 19:25:09 pikachu kernel: NVRM: Xid (PCI:0000:0b:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus. Jun 12 19:25:09 pikachu kernel: NVRM: GPU 0000:0b:00.0: GPU has fallen off the bus. Jun 12 19:25:09 pikachu kernel: nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: Unable to change power state from D3hot to D0, device inaccessible Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: Unable to change power state from D3cold to D0, device inaccessible Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: Controller not ready at resume -19 Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: PCI post-resume error -19! Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: HC died; cleaning up Jun 12 19:25:09 pikachu kernel: audit: type=1400 audit(1686594309.980:429): apparmor="DENIED" operation="open" class="file" profile="snap.keepassxc.keepassxc" name="/sys/devices/pci00> Jun 12 19:25:10 pikachu kernel: nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff Jun 12 19:25:10 pikachu kernel: ucsi_ccg 0-0008: i2c_transfer failed -110 ``` Further up in the logs I also see the following (in case it's related): ``` [drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to grab modeset ownership ``` I am using an RTX 2080Ti on driver version 525.105.17. I have attached the full dmesg log ProblemType: Bug DistroRelease: Ubuntu 23.04 Package: nvidia-driver-525 525.105.17-0ubuntu1 ProcVersionSignature: Ubuntu 6.2.0-20.20-generic 6.2.6 Uname: Linux 6.2.0-20-generic x86_64 NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia ApportVersion: 2.26.1-0ubuntu2 Architecture: amd64 CasperMD5CheckResult: pass CurrentDesktop: ubuntu:GNOME Date: Mon Jun 12 19:35:37 2023 InstallationDate: Installed on 2022-12-06 (187 days ago) InstallationMedia: Ubuntu 22.10 "Kinetic Kudu" - Release amd64 (20221020) SourcePackage: nvidia-graphics-drivers-525 UpgradeStatus: Upgraded to lunar on 2023-04-21 (51 days ago) To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2023585/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp