On Wed, Dec 03, 2025 at 10:13:44AM +0200, Jani Nikula wrote:
> On Tue, 02 Dec 2025, Imre Deak <[email protected]> wrote:
> > On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote:
> >> On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
> >> > intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been
> >> > initialized, and dmc is thus NULL.
> >> >
> >> > That would be the case when the call path is
> >> > intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() ->
> >> > gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as
> >> > intel_power_domains_init_hw() is called *before* intel_dmc_init().
> >> >
> >> > However, gen9_set_dc_state() calls intel_dmc_update_dc6_allowed_count()
> >> > conditionally, depending on the current and target DC states. At probe,
> >> > the target is disabled, but if DC6 is enabled, the function is called,
> >> > and an oops follows. Apparently it's quite unlikely that DC6 is enabled
> >> > at probe, as we haven't seen this failure mode before.
> >> >
> >> > Add NULL checks and switch the dmc->display references to just display.
> >> >
> >> > Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6
> >> > counter")
> >> > Cc: Mohammed Thasleem <[email protected]>
> >> > Cc: Imre Deak <[email protected]>
> >> > Cc: <[email protected]> # v6.16+
> >> > Signed-off-by: Jani Nikula <[email protected]>
> >> >
> >> > ---
> >> >
> >> > Rare case, but this may also throw off the rc6 counting in debugfs when
> >> > it does happen.
> >>
> >> Yes, I missed the case where the driver is being loaded while DC6 is
> >> enabled, this is what happens for the reporter:
> >>
> >> i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
> >>
> >> That's odd, as DC6 requires the DMC firmware, which - if it's indeed
> >> loaded by BIOS for instance - will be overwritten by the driver, not a
> >> well specified sequence (even though the driver is trying to handle it
> >> correctly by disabling any active firmware handler).
> >>
> >> But as you pointed out this would also throw off the cooked-up DC6
> >> counter tracking,
> >
> > Actually the patch would keep the counter working, as the counter
> > wouldn't be updated in the dmc==NULL case. However I still think the
> > correct fix would be to check the correct DC state, which from the POV
> > of the counter tracking is the driver's version of the state, not the HW
> > state.
>
> One thing I failed to mention is that this happens in a KASAN run in
> QEMU. So I'm kind of not surprised we haven't hit this before. And it
> impacts the deductions about the DC state.
Ok, it's strange why QEMU decides to initialize the DC_STATE_EN register
to a non-zero value then. But in any case the driver should handle it.
> I'm not quite sure what exactly you're suggesting, maybe a draft patch
> would communicate the idea better than plain English? ;)
intel_dmc_get_dc6_allowed_count() still needs to check for dmc==NULL, as
the debugfs entry can be read at any point. With that, what I meant is:
in gen9_set_dc_state():
...
- dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6;
+ dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
...
in intel_dmc_get_dc6_allowed_count():
...
if (DISPLAY_VER(display) < 14)
return false;
+ if (!dmc) {
+ *count = 0;
+ return true;
+ }
+
mutex_lock(&power_domains->lock);
- dc6_enabled = intel_de_read(display, DC_STATE_EN) &
- DC_STATE_EN_UPTO_DC6;
+ dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
...
> Anyway, I think "not oopsing" is a lot better than "inaccurate DC
> counters in debugfs".
Agreed, the above would ensure both.
>
> BR,
> Jani.
>
>
> >
> >> so could instead the counter update depend on the
> >> driver's DC state instead of the HW state? I.e. set
> >> gen9_set_dc_state()/dc6_was_enabled,
> >> intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state
> >> says that DC6 was indeed enabled by the driver (instead of checking the
> >> HW state).
> >>
> >> That would fix the reporter's oops when calling
> >> intel_dmc_update_dc6_allowed_count(start_tracking=false), by not calling
> >> it if the driver hasn't actually enabled DC6 and it would also keep the
> >> DC6 counter tracking correct.
> >>
> >> intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also
> >> guaranteed to be called only once the firmware is loaded, as until that
> >> point enabling DC6 is blocked (by holding a reference on the DC_off
> >> power well).
> >>
> >> > ---
> >> > drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++---
> >> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c
> >> > b/drivers/gpu/drm/i915/display/intel_dmc.c
> >> > index 2fb6fec6dc99..169bbbc91f6d 100644
> >> > --- a/drivers/gpu/drm/i915/display/intel_dmc.c
> >> > +++ b/drivers/gpu/drm/i915/display/intel_dmc.c
> >> > @@ -1570,10 +1570,10 @@ void intel_dmc_update_dc6_allowed_count(struct
> >> > intel_display *display,
> >> > struct intel_dmc *dmc = display_to_dmc(display);
> >> > u32 dc5_cur_count;
> >> >
> >> > - if (DISPLAY_VER(dmc->display) < 14)
> >> > + if (!dmc || DISPLAY_VER(display) < 14)
> >> > return;
> >> >
> >> > - dc5_cur_count = intel_de_read(dmc->display,
> >> > DG1_DMC_DEBUG_DC5_COUNT);
> >> > + dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
> >> >
> >> > if (!start_tracking)
> >> > dmc->dc6_allowed.count += dc5_cur_count -
> >> > dmc->dc6_allowed.dc5_start;
> >> > @@ -1587,7 +1587,7 @@ static bool intel_dmc_get_dc6_allowed_count(struct
> >> > intel_display *display, u32 *
> >> > struct intel_dmc *dmc = display_to_dmc(display);
> >> > bool dc6_enabled;
> >> >
> >> > - if (DISPLAY_VER(display) < 14)
> >> > + if (!dmc || DISPLAY_VER(display) < 14)
> >> > return false;
> >> >
> >> > mutex_lock(&power_domains->lock);
> >> > --
> >> > 2.47.3
> >> >
>
> --
> Jani Nikula, Intel