On Thu, 04 Dec 2025, Imre Deak <[email protected]> wrote: > On Thu, Dec 04, 2025 at 01:30:27PM +0200, Jani Nikula wrote: >> On Wed, 03 Dec 2025, Imre Deak <[email protected]> wrote: >> > On Wed, Dec 03, 2025 at 10:13:44AM +0200, Jani Nikula wrote: >> >> On Tue, 02 Dec 2025, Imre Deak <[email protected]> wrote: >> >> > On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote: >> >> >> On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote: >> >> >> > intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been >> >> >> > initialized, and dmc is thus NULL. >> >> >> > >> >> >> > That would be the case when the call path is >> >> >> > intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() -> >> >> >> > gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as >> >> >> > intel_power_domains_init_hw() is called *before* intel_dmc_init(). >> >> >> > >> >> >> > However, gen9_set_dc_state() calls >> >> >> > intel_dmc_update_dc6_allowed_count() >> >> >> > conditionally, depending on the current and target DC states. At >> >> >> > probe, >> >> >> > the target is disabled, but if DC6 is enabled, the function is >> >> >> > called, >> >> >> > and an oops follows. Apparently it's quite unlikely that DC6 is >> >> >> > enabled >> >> >> > at probe, as we haven't seen this failure mode before. >> >> >> > >> >> >> > Add NULL checks and switch the dmc->display references to just >> >> >> > display. >> >> >> > >> >> >> > Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 >> >> >> > counter") >> >> >> > Cc: Mohammed Thasleem <[email protected]> >> >> >> > Cc: Imre Deak <[email protected]> >> >> >> > Cc: <[email protected]> # v6.16+ >> >> >> > Signed-off-by: Jani Nikula <[email protected]> >> >> >> > >> >> >> > --- >> >> >> > >> >> >> > Rare case, but this may also throw off the rc6 counting in debugfs >> >> >> > when >> >> >> > it does happen. >> >> >> >> >> >> Yes, I missed the case where the driver is being loaded while DC6 is >> >> >> enabled, this is what happens for the reporter: >> >> >> >> >> >> i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2) >> >> >> >> >> >> That's odd, as DC6 requires the DMC firmware, which - if it's indeed >> >> >> loaded by BIOS for instance - will be overwritten by the driver, not a >> >> >> well specified sequence (even though the driver is trying to handle it >> >> >> correctly by disabling any active firmware handler). >> >> >> >> >> >> But as you pointed out this would also throw off the cooked-up DC6 >> >> >> counter tracking, >> >> > >> >> > Actually the patch would keep the counter working, as the counter >> >> > wouldn't be updated in the dmc==NULL case. However I still think the >> >> > correct fix would be to check the correct DC state, which from the POV >> >> > of the counter tracking is the driver's version of the state, not the HW >> >> > state. >> >> >> >> One thing I failed to mention is that this happens in a KASAN run in >> >> QEMU. So I'm kind of not surprised we haven't hit this before. And it >> >> impacts the deductions about the DC state. >> > >> > Ok, it's strange why QEMU decides to initialize the DC_STATE_EN register >> > to a non-zero value then. But in any case the driver should handle it. >> > >> >> I'm not quite sure what exactly you're suggesting, maybe a draft patch >> >> would communicate the idea better than plain English? ;) >> > >> > intel_dmc_get_dc6_allowed_count() still needs to check for dmc==NULL, as >> > the debugfs entry can be read at any point. With that, what I meant is: >> > >> > in gen9_set_dc_state(): >> > ... >> > - dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6; >> > + dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6; >> >> I still don't understand why we can trust our own value rather than >> what's in the hardware in this case. > > The BIOS/FW can set random flags in the register, as in the above case, > so it can't be trusted. The counter update ending the tracking of the > duration of a DC6 enabled state only works if the driver did in fact > enable DC6 previously and has started the tracking accordingly. This is > only guaranteed if the driver has set DC_STATE_EN_UPTO_DC6 in > power_domains->dc_state, the corrsponding HW flag doesn't guarantee it. > >> For resume, we even call gen9_sanitize_dc_state(), but not for probe. > > After system suspend, the driver enabling DC9 by setting the > corrsponding DC9 flag in the DC_STATE_EN register, the HW/firmware will > disable DC9 while resuming. The SW version of the DC state will be > updated accordingly in the above function to reflect the disabled DC9 > state. > >> > ... >> > >> > in intel_dmc_get_dc6_allowed_count(): >> > ... >> > if (DISPLAY_VER(display) < 14) >> > return false; >> > >> > + if (!dmc) { >> > + *count = 0; >> > + return true; >> > + } >> > + >> >> This seems neat but is overkill. dmc is never NULL here, but I added the >> check for completeness. > > intel_dmc_get_dc6_allowed_count() shouldn't fall back on DISPLAY_VER>=14 > to report the DC6 residency in a way that only works for older > platforms. Hence the function should return true for DISPLAY_VER>=14. > >> It's the intel_dmc_update_dc6_allowed_count() that's more fragile, and I >> want that to have the !dmc check, instead of relying on the subtle >> dependency on power_domains->dc_state. > > The counter tracking should depend on the power_domians->dc_state SW > state as described above, so that's the correct thing to do there. > dmc==NULL in intel_dmc_update_dc6_allowed_count() would be only a bug in > the driver, if you wanted to check for that it should be a > WARN_ON(!dmc) check.
So we still have the NULL pointer dereference at probe, albeit very unlikely. I don't really know what to do here. Care to send a patch to fix it? BR, Jani. > >> > mutex_lock(&power_domains->lock); >> > - dc6_enabled = intel_de_read(display, DC_STATE_EN) & >> > - DC_STATE_EN_UPTO_DC6; >> > + dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6; >> > ... >> > >> >> Anyway, I think "not oopsing" is a lot better than "inaccurate DC >> >> counters in debugfs". >> > >> > Agreed, the above would ensure both. >> > >> >> >> >> BR, >> >> Jani. >> >> >> >> >> >> > >> >> >> so could instead the counter update depend on the >> >> >> driver's DC state instead of the HW state? I.e. set >> >> >> gen9_set_dc_state()/dc6_was_enabled, >> >> >> intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state >> >> >> says that DC6 was indeed enabled by the driver (instead of checking the >> >> >> HW state). >> >> >> >> >> >> That would fix the reporter's oops when calling >> >> >> intel_dmc_update_dc6_allowed_count(start_tracking=false), by not >> >> >> calling >> >> >> it if the driver hasn't actually enabled DC6 and it would also keep the >> >> >> DC6 counter tracking correct. >> >> >> >> >> >> intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also >> >> >> guaranteed to be called only once the firmware is loaded, as until that >> >> >> point enabling DC6 is blocked (by holding a reference on the DC_off >> >> >> power well). >> >> >> >> >> >> > --- >> >> >> > drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++--- >> >> >> > 1 file changed, 3 insertions(+), 3 deletions(-) >> >> >> > >> >> >> > diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c >> >> >> > b/drivers/gpu/drm/i915/display/intel_dmc.c >> >> >> > index 2fb6fec6dc99..169bbbc91f6d 100644 >> >> >> > --- a/drivers/gpu/drm/i915/display/intel_dmc.c >> >> >> > +++ b/drivers/gpu/drm/i915/display/intel_dmc.c >> >> >> > @@ -1570,10 +1570,10 @@ void >> >> >> > intel_dmc_update_dc6_allowed_count(struct intel_display *display, >> >> >> > struct intel_dmc *dmc = display_to_dmc(display); >> >> >> > u32 dc5_cur_count; >> >> >> > >> >> >> > - if (DISPLAY_VER(dmc->display) < 14) >> >> >> > + if (!dmc || DISPLAY_VER(display) < 14) >> >> >> > return; >> >> >> > >> >> >> > - dc5_cur_count = intel_de_read(dmc->display, >> >> >> > DG1_DMC_DEBUG_DC5_COUNT); >> >> >> > + dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT); >> >> >> > >> >> >> > if (!start_tracking) >> >> >> > dmc->dc6_allowed.count += dc5_cur_count - >> >> >> > dmc->dc6_allowed.dc5_start; >> >> >> > @@ -1587,7 +1587,7 @@ static bool >> >> >> > intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 * >> >> >> > struct intel_dmc *dmc = display_to_dmc(display); >> >> >> > bool dc6_enabled; >> >> >> > >> >> >> > - if (DISPLAY_VER(display) < 14) >> >> >> > + if (!dmc || DISPLAY_VER(display) < 14) >> >> >> > return false; >> >> >> > >> >> >> > mutex_lock(&power_domains->lock); >> >> >> > -- >> >> >> > 2.47.3 >> >> >> > >> >> >> >> -- >> >> Jani Nikula, Intel >> >> -- >> Jani Nikula, Intel -- Jani Nikula, Intel
