Am 21.11.25 um 21:35 schrieb Rafael J. Wysocki:
On Thu, Nov 20, 2025 at 4:41 AM Armin Wolf <[email protected]> wrote:
Drivers registering thermal zone/cooling devices are currently unable
to tell the thermal core what parent device the new thermal zone/
cooling device should have, potentially causing issues with suspend
ordering
This is one potential class of problems that may arise, but I would
like to see a real example of this.
As it stands today, thermal_class has no PM callbacks, so there are no
callback execution ordering issues with devices in that class and what
other suspend/resume ordering issues are there?
Correct, that is why i said "potentially".
Also, the suspend and resume of thermal zones is handled via PM
notifiers. Is there a problem with this?
The problem with PM notifiers is that thermal zones stop working even before
user space is frozen. Freezing user space might take a lot of time, so having
no thermal management during this period is less than ideal.
This problem would not occur when using dev_pm_ops, as thermal zones would be
suspended after user space has been frozen successfully. Additionally, when
using
dev_pm_ops we can get rid of thermal_pm_suspended, as the device core already
mandates
that no new devices (including thermal zones and cooling devices) be registered
during
a suspend/resume cycle.
Replacing the PM notifiers with dev_pm_ops would of course be a optimization
with
its own patch series.
and making it impossible for user space applications to
associate a given thermal zone device with its parent device.
Why does user space need to know the parent of a given cooling device
or thermal zone?
Lets say that we have two thermal zones registered by two instances of the
Intel Wifi driver. User space is currently unable to find out which thermal zone
belongs to which Wifi adapter, as both thermal zones have the (nearly) same type string
("iwlwifi[0-X]").
This problem would be solved once we populate the parent device pointer inside
the thermal zone
device, as user space can simply look at the "device" symlink to determine the
parent device behind
a given thermal zone device.
Additionally, being able to access the acpi_handle of the parent device will be
necessary for the
ACPI thermal zone driver to support cooling devices other than ACPI fans and
ACPI processors.
This patch series aims to fix this issue by extending the functions
used to register thermal zone/cooling devices to also accept a parent
device pointer. The first six patches convert all functions used for
registering cooling devices, while the functions used for registering
thermal zone devices are converted by the remaining two patches.
I tested this series on various devices containing (among others):
- ACPI thermal zones
- ACPI processor devices
- PCIe cooling devices
- Intel Wifi card
- Intel powerclamp
- Intel TCC cooling
What exactly did you do to test it?
I tested:
- the thermal zone temperature readout
- correctness of the new sysfs links
- suspend/resume
I also verified that ACPI thermal zones still bind with the ACPI fans.
I also compile-tested the remaining affected drivers, however i would
still be happy if the relevant maintainers (especially those of the
mellanox ethernet switch driver) could take a quick glance at the
code and verify that i am using the correct device as the parent
device.
I think that the above paragraph is not relevant any more?
You are right, however i originally meant to CC the mellanox maintainers as
i was a bit unsure about the changes i made to their driver. I will rework
this section in the next revision and CC the mellanox maintainers.
This work is also necessary for extending the ACPI thermal zone driver
to support the _TZD ACPI object in the future.
I'm still unsure why _TZD support requires the ability to set a
thermal zone parent device.
_TZD allows the ACPI thermal zone to bind to cooling devices other than ACPI
fans
and ACPI processors, like ACPI batteries. This however will currently not work
as
the ACPI thermal zone driver uses the private drvdata of the cooling device to
determine if said cooling device should bind. This only works for ACPI fans and
processors due to the fact that those drivers store a ACPI device pointer inside
drvdata, something the ACPI thermal zone expects.
As we cannot require all cooling devices to store an ACPI device pointer inside
their drvdata field in order to support ACPI, we must use a more generic
approach.
I was thinking about using the acpi_handle of the parent device instead of
messing
with the drvdata field, but this only works if the parent device pointer of the
cooling device is populated.
(Cooling devices without a parent device would then be ignored by the ACPI
thermal
zone driver, as such cooling devices cannot be linked to ACPI).
Signed-off-by: Armin Wolf <[email protected]>
---
Armin Wolf (8):
thermal: core: Allow setting the parent device of cooling devices
thermal: core: Set parent device in thermal_of_cooling_device_register()
ACPI: processor: Stop creating "device" sysfs link
That link is not to the cooling devices' parent, but to the ACPI
device object (a struct acpi_device) that corresponds to the parent.
The parent of the cooling device should be the processor device, not
its ACPI companion, so I'm not sure why there would be a conflict.
From the perspective of the Linux device core, a parent device does not have to
be
a "physical" device. In the case of the ACPI processor driver, the ACPI device
is used,
so the cooling device registered by said driver belongs to the ACPI device. I
agree
that using the Linux processor device would make more sense, but this will
require
changes inside the ACPI processor driver.
As for the "device" symlink: The conflict would be a naming conflict, as both
"device" symlinks
(the one created by the ACPI processor driver and the one created by the device
core) will
be created in the same directory (which is the directory of the cooling device).
ACPI: fan: Stop creating "device" sysfs link
ACPI: video: Stop creating "device" sysfs link
Analogously in the above two cases AFAICS.
The parent of a cooling device should be a "physical" device object,
like a platform device or a PCI device or similar, not a struct
acpi_device (which in fact is not a device even).
From the perspective of the Linux device core, a ACPI device is a perfectly
valid device.
I agree that using a platform device or PCI device is better, but this already
happens
inside the ACPI fan driver (platform device).
Only the ACPI video driver created a "device" sysfs link that points to the
ACPI device
instead of the PCI device. I just noticed that i accidentally changed this by
using the
PCI device as the parent device for the cooling device.
If you want then we can keep this change.
thermal: core: Set parent device in thermal_cooling_device_register()
ACPI: thermal: Stop creating "device" sysfs link
And this link is to the struct acpi_device representing the thermal zone itself.
Correct, the ACPI thermal zone driver is a ACPI driver, meaning that he binds to
ACPI devices. Because of this all (thermal zone) devices created by an instance
of
said driver are descendants of the ACPI device said instance is bound to.
We can of course convert the ACPI thermal zone driver into a platform driver,
but
this would be a separate patch series.
thermal: core: Allow setting the parent device of thermal zone devices
I'm not sure if this is a good idea, at least until it is clear what
the role of a thermal zone parent device should be.
Take a look at my explanation with the Intel Wifi driver.
Thanks,
Armin Wolf