On Sat, Oct 25, 2025 at 3:01 AM Mario Limonciello (AMD)
<[email protected]> wrote:
>
> During a normal successful hibernate sequence devices will go through
> the freeze() callbacks create an image, go through the thaw() callbacks,
> and poweroff() callbacks.
>
> During a successful hibernate sequence some device drivers may want to
> skip the thaw() callbacks.  This confuses the PM core though because it
> thinks the device is no longer suspended.

The problem only really occurs if hibernation is aborted before or
during the final "poweroff" transition.

What happens is that if a driver decides to leave the device in the
"frozen" state during its "thaw" callback and its "poweroff" callback
is not invoked because hibernation is aborted earlier, the device will
be left in the "frozen" state going forward.

The goal of the change should be to make the core detect that
situation and "thaw" the device.

> To accommodate drivers that want to do this, introduce a new is_frozen
> bit that the driver can set and manage.  From the driver perspective
> any thaw() or restore() callbacks that are being skipped should set
> is_frozen and return an error code.

"Restore" has nothing to do with this, it is just about "freeze".

> The PM core will then put the device back into the list of devices to resume 
> for any aborted hibernate.

That's not what the patch does, though (see below).

All of this is mainly about what the core should do when it sees the
"is_frozen" flag set, so a few observations to that end:

1. That flag is only relevant when hibernation is aborted before
invoking the given driver's "poweroff" callback (because that callback
must be prepared to deal with a "frozen" device).

2. If the final "poweroff" transition is aborted during its "prepare"
phase, the "frozen" device may need to be "thawed" even if its
driver's "prepare" callback is not invoked.

3. There is a possibility, not taken into account so far, that
hibernation is aborted because of a failure to save the image.  Then,
"frozen" devices will need to be "thawed" before starting the final
"poweroff" transition.

Moreover, I'm not sure if it really makes sense to invoke "complete"
callbacks during the "thaw" transition for the devices left in the
"frozen" state.

All of the above means that the approach needs to be rethought and my
advice is to revert the commit causing the AMD driver to leave its
device in the "frozen" for 6.18 (and previous kernels).

> Tested-by: Muhammad Usama Anjum <[email protected]>
> Signed-off-by: Mario Limonciello (AMD) <[email protected]>
> ---
> v2:
>  * add tag
>  * fix lkp robot issue
>  * rebase on linux-pm/bleeding-edge
> ---
>  Documentation/driver-api/pm/devices.rst | 8 ++++++++
>  drivers/base/power/main.c               | 7 +++++++
>  include/linux/pm.h                      | 3 +++
>  3 files changed, 18 insertions(+)
>
> diff --git a/Documentation/driver-api/pm/devices.rst 
> b/Documentation/driver-api/pm/devices.rst
> index 36d5c9c9fd113..55c6337271086 100644
> --- a/Documentation/driver-api/pm/devices.rst
> +++ b/Documentation/driver-api/pm/devices.rst
> @@ -578,6 +578,14 @@ should already have been stored during the ``freeze``, 
> ``freeze_late`` or
>  the entire system, so it is not necessary for the callback to put the device 
> in
>  a low-power state.
>
> +Skipping thaw phase
> +-------------------
> +In some rare situations, it may be desirable to skip the thaw phases
> +(``thaw_noirq``, ``thaw_early``, ``thaw``) of a device entirely.  This can be
> +achieved by a device driver returning an error code from any of it's thaw
> +callbacks but also setting dev->power.is_frozen to true.

Returning an error code should not be necessary.

Also this needs to be done in "thaw_noirq" or maybe even in
"freeze_noirq" because "thaw_noirq" may involve some bus type actions
changing the state of the device before the driver gets to it.

So the driver would opt in for leaving the device in the "frozen"
state at the "noirq" stage of the preceding "freeze" transition.  That
can be achieved by setting a "leave_in_freeze" flag, so no callbacks
would be run for any devices with that flag set during the subsequent
"thaw" transition.

If hibernation is aborted before the final "poweroff" transition
begins, the "thaw*" and "complete" callbacks will have to be run for
all of those devices (I'm wondering if any ordering issues may arise
at that point; presumably, devices that depend on the "frozen" ones
would also need to be "frozen"?).

During the final "poweroff" transition, since "complete" has not been
called for any of the "frozen" devices, it should not be necessary to
call "prepare" for any of them, so that one can be skipped.

Again, if hibernation is aborted at this point, all of the "thaw*" and
complete callbacks need to be run for all of the "frozen" devices.

Now, "poweroff", "poweroff" and "poweroff_noirq" callbacks for the
"frozen" devices need to be prepared to deal with them, but the exact
rules there will need some consideration.  They kind of need to assume
that "freeze" may be changed into "poweroff" transparently without a
"thaw" in between and that generally depends on the bus type/PM domain
involved.

> This indicates to the
> +PM core that the device is still in the frozen state.  The PM core will 
> consider
> +this when resuming the device in later phases such as `restore` or 
> `poweroff`.
>
>  Leaving Hibernation
>  -------------------
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 7a8807ec9a5d0..c5a192fc04344 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1110,6 +1110,13 @@ static void device_resume(struct device *dev, 
> pm_message_t state, bool async)
>
>   End:
>         error = dpm_run_callback(callback, dev, state, info);
> +#ifdef CONFIG_HIBERNATE_CALLBACKS

CONFIG_HIBERNATION should be sufficient.

> +       /* device manages frozen state */
> +       if (error && dev->power.is_frozen) {
> +               dev->power.is_suspended = true;
> +               error = 0;
> +       }

This assumes that the callback will run, but what if hibernation is
aborted before running it?  Isn't that really the problem at hand?

> +#endif
>
>         device_unlock(dev);
>         dpm_watchdog_clear(&wd);
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index a72e42eec1303..852902fc72158 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -689,6 +689,9 @@ struct dev_pm_info {
>  #else
>         bool                    should_wakeup:1;
>  #endif
> +#ifdef CONFIG_HIBERNATE_CALLBACKS
> +       bool                    is_frozen:1;    /* Owned by the driver */
> +#endif
>  #ifdef CONFIG_PM
>         struct hrtimer          suspend_timer;
>         u64                     timer_expires;
> --

Reply via email to