https://bugs.kde.org/show_bug.cgi?id=510992
nyanpasu64 <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #26 from nyanpasu64 <[email protected]> --- For some added information, I was encountering likely the same issue on my computers. Since I have past experience debugging Linux suspend issues, I decided to debug the code to find what went wrong (in hindsight a rabbit hole deeper than I could have imagined). After I moved my Arch Linux SSD from a Ryzen desktop into an Intel NUC, Plasma would fail to sleep from the lock screen (if I left the computer idle or clicked Sleep myself). It would successfully sleep from the start menu (which matches the "good sleep with Kickoff menu" reported here). I did not try sleeping from Ctrl+Alt+Del. ## Tracing the kernel On a failed sleep, I got the "Some devices failed to suspend, or early wake event detected" error message. Based on past knowledge about debugging system suspend, I ran the following commands in a root console: echo 1 > /sys/power/pm_print_times echo 1 > /sys/power/pm_debug_messages Afterwards, a failed sleep produced `journalctl --system` messages: Oct 24 22:09:30 ryzen kernel: PM: Wakeup pending, aborting suspend Oct 24 22:09:30 ryzen kernel: PM: active wakeup source: mmc0 Oct 24 22:09:30 ryzen kernel: PM: suspend of devices aborted after 151.615 msecs Oct 24 22:09:30 ryzen kernel: PM: start suspend of devices aborted after 169.797 msecs Oct 24 22:09:30 ryzen kernel: PM: Some devices failed to suspend, or early wake event detected The "Wakeup pending, aborting suspend" message comes from function `pm_wakeup_pending()`. This function checks if event checks are enabled, and if some counters have changed aborts suspend and calls `pm_print_active_wakeup_sources()`, which prints `wakeup_sources`. I don't know what changed the counters, but instead looked into what wrote to wakeup_sources. I found that `pm_wakeup_ws_event()` would activate an event and `wakeup_source_register() → wakeup_source_add()` would add a new one. - While writing this comment, I decided to trace `pm_wakeup_pending`. It was called so many times it overflowed my terminal's scrollback! I tried running it again, but plasmashell hung, kscreenlocker_greet froze freeze at a black screen, then *kwin* itself hung, I had to enable raw input with SysRq-R to open a TTY, and rebooting hung with `rcu_tasks_wait_gp` counting up endlessly. Yay Linux... To find who changed wakeup events, I used my stacksnoop fork at https://github.com/nyanpasu64/bcc/blob/local/examples/tracing/stacksnoop.py to trace a failed suspend: nyanpasu64@ryzen ~/code/bcc (local)> sudo ./examples/tracing/stacksnoop.py pm_wakeup_ws_event wakeup_source_register TIME(s) FUNCTION 5.660198689: 0: ret_from_fork_asm [kernel] 1: ret_from_fork [kernel] 2: kthread [kernel] 3: worker_thread [kernel] 4: process_one_work [kernel] 5: async_run_entry_fn [kernel] 6: async_suspend [kernel] 7: device_suspend [kernel] 8: dpm_run_callback [kernel] 9: pci_pm_suspend [kernel] 10: __pm_runtime_resume [kernel] 11: rpm_resume [kernel] 12: rpm_callback [kernel] 13: __rpm_callback [kernel] 14: rtsx_pci_runtime_resume [rtsx_pci] 15: mmc_detect_change [mmc_core] 16: pm_wakeup_ws_event [kernel] Reading the source, the problem is that `pci_pm_suspend()` wakes PCI(e) devices before sending them into a full sleep state, but in the process, `_mmc_detect_change()` will "Prevent system sleep for 5s to allow user space to consume the\n corresponding uevent"... which interrupts a system sleep in progress! ## Debugging userspace `wakeup_count` One oddity I found was that even when I didn't get a failed sleep, stacksnoop said that `mmc_detect_change() → pm_wakeup_ws_event()` was still being called! This bug thread provides a possible solution to this last mystery; the kernel doesn't abort sleep from a wakeup event unless event checks are enabled, and perhaps it's only true when Plasma is "writing wakeup count file". The docs for `/sys/power/wakeup_count` (https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-power#:~:text=What:%09%09/sys/power/wakeup_count) say the file explicitly enables "taking into account the concurrent arrival of wakeup events". And I've found broken code in the kernel sending wakeup events during suspend. So when Plasma 6.5.0 enabled wakeup checks, it must've caused these events to abort suspend! To test with this file disabled, I ran `sudo chmod -x /usr/libexec/kf6/kauth/wakeupsourcehelper` on a fresh Fedora boot. I tried sleeping from Win+L four times, the first and third with +x and the second and fourth with -x. The journal log "Some devices failed to suspend" only appeared with +x, suggesting this workaround is necessary and sufficient to fix failed suspends. I think sleeping from the start menu doesn't fail, because `wakeupsourcehelper` only runs when sleeping from the lock screen: - I started `journalctl --follow` and slept from the lock screen (which fades to black and turns off the screen). The journal contained a message: `systemd[1]: Started dbus-:[email protected].` followed by "Some devices failed to suspend". Neither message appeared when sleeping from the start menu. - Is it an oversight that `wakeupsourcehelper` is never run by the start menu? https://invent.kde.org/plasma/powerdevil/-/blob/master/daemon/actions/bundled/suspendsession.cpp unconditionally calls "org.kde.powerdevil.wakeupsourcehelper" for sleep and hibernate, but we don't see it running. ## Workarounds - On Arch Linux, blacklisting `rtsx_pci_sdmmc` fixed system sleep, at the cost of breaking the SD card reader. - On Fedora, I added file `/etc/systemd/sleep.conf.d/mysleep.conf` with contents `[Sleep]\nSuspendState=mem mem`. This retries a failed suspend, and the second attempt will work. ## Next steps To identify the kernel drivers at fault, I strongly recommend that people on this bug thread run: echo 1 > /sys/power/pm_print_times echo 1 > /sys/power/pm_debug_messages then attempt a failed suspend (hopefully it doesn't crash), and see which devices show up in `journalctl --system -b0` or `dmesg` under "active wakeup source". Then to identify the exact stack traces, you can also run `stacksnoop.py pm_wakeup_ws_event wakeup_source_register` with my bcc repo (or use upstream https://github.com/iovisor/bcc if you don't trust my code). However once the patch is merged and disables `/sys/power/wakeup_count` in Plasma 6.5.3, I don't know how others would test what hardware is at fault. I think I'd need a powerdevil maintainer to suggest a course of action. -- You are receiving this mail because: You are watching all bug changes.
