On Thu, 04 Dec 2025, Imre Deak <[email protected]> wrote:
> On Thu, Dec 04, 2025 at 01:30:27PM +0200, Jani Nikula wrote:
>> On Wed, 03 Dec 2025, Imre Deak <[email protected]> wrote:
>> > On Wed, Dec 03, 2025 at 10:13:44AM +0200, Jani Nikula wrote:
>> >> On Tue, 02 Dec 2025, Imre Deak <[email protected]> wrote:
>> >> > On Tue, Dec 02, 2025 at 11:24:42PM +0200, Imre Deak wrote:
>> >> >> On Tue, Dec 02, 2025 at 08:39:50PM +0200, Jani Nikula wrote:
>> >> >> > intel_dmc_update_dc6_allowed_count() oopses when DMC hasn't been
>> >> >> > initialized, and dmc is thus NULL.
>> >> >> > 
>> >> >> > That would be the case when the call path is
>> >> >> > intel_power_domains_init_hw() -> {skl,bxt,icl}_display_core_init() ->
>> >> >> > gen9_set_dc_state() -> intel_dmc_update_dc6_allowed_count(), as
>> >> >> > intel_power_domains_init_hw() is called *before* intel_dmc_init().
>> >> >> > 
>> >> >> > However, gen9_set_dc_state() calls 
>> >> >> > intel_dmc_update_dc6_allowed_count()
>> >> >> > conditionally, depending on the current and target DC states. At 
>> >> >> > probe,
>> >> >> > the target is disabled, but if DC6 is enabled, the function is 
>> >> >> > called,
>> >> >> > and an oops follows. Apparently it's quite unlikely that DC6 is 
>> >> >> > enabled
>> >> >> > at probe, as we haven't seen this failure mode before.
>> >> >> > 
>> >> >> > Add NULL checks and switch the dmc->display references to just 
>> >> >> > display.
>> >> >> > 
>> >> >> > Fixes: 88c1f9a4d36d ("drm/i915/dmc: Create debugfs entry for dc6 
>> >> >> > counter")
>> >> >> > Cc: Mohammed Thasleem <[email protected]>
>> >> >> > Cc: Imre Deak <[email protected]>
>> >> >> > Cc: <[email protected]> # v6.16+
>> >> >> > Signed-off-by: Jani Nikula <[email protected]>
>> >> >> > 
>> >> >> > ---
>> >> >> > 
>> >> >> > Rare case, but this may also throw off the rc6 counting in debugfs 
>> >> >> > when
>> >> >> > it does happen.
>> >> >> 
>> >> >> Yes, I missed the case where the driver is being loaded while DC6 is
>> >> >> enabled, this is what happens for the reporter:
>> >> >> 
>> >> >> i915 0000:00:04.0: [drm] *ERROR* DC state mismatch (0x0 -> 0x2)
>> >> >> 
>> >> >> That's odd, as DC6 requires the DMC firmware, which - if it's indeed
>> >> >> loaded by BIOS for instance - will be overwritten by the driver, not a
>> >> >> well specified sequence (even though the driver is trying to handle it
>> >> >> correctly by disabling any active firmware handler).
>> >> >> 
>> >> >> But as you pointed out this would also throw off the cooked-up DC6
>> >> >> counter tracking,
>> >> >
>> >> > Actually the patch would keep the counter working, as the counter
>> >> > wouldn't be updated in the dmc==NULL case. However I still think the
>> >> > correct fix would be to check the correct DC state, which from the POV
>> >> > of the counter tracking is the driver's version of the state, not the HW
>> >> > state.
>> >> 
>> >> One thing I failed to mention is that this happens in a KASAN run in
>> >> QEMU. So I'm kind of not surprised we haven't hit this before. And it
>> >> impacts the deductions about the DC state.
>> >
>> > Ok, it's strange why QEMU decides to initialize the DC_STATE_EN register
>> > to a non-zero value then. But in any case the driver should handle it.
>> >
>> >> I'm not quite sure what exactly you're suggesting, maybe a draft patch
>> >> would communicate the idea better than plain English? ;)
>> >
>> > intel_dmc_get_dc6_allowed_count() still needs to check for dmc==NULL, as
>> > the debugfs entry can be read at any point. With that, what I meant is:
>> >
>> > in gen9_set_dc_state():
>> > ...
>> > -       dc6_was_enabled = val & DC_STATE_EN_UPTO_DC6;
>> > +       dc6_was_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
>> 
>> I still don't understand why we can trust our own value rather than
>> what's in the hardware in this case.
>
> The BIOS/FW can set random flags in the register, as in the above case,
> so it can't be trusted. The counter update ending the tracking of the
> duration of a DC6 enabled state only works if the driver did in fact
> enable DC6 previously and has started the tracking accordingly. This is
> only guaranteed if the driver has set DC_STATE_EN_UPTO_DC6 in
> power_domains->dc_state, the corrsponding HW flag doesn't guarantee it.
>
>> For resume, we even call gen9_sanitize_dc_state(), but not for probe.
>
> After system suspend, the driver enabling DC9 by setting the
> corrsponding DC9 flag in the DC_STATE_EN register, the HW/firmware will
> disable DC9 while resuming. The SW version of the DC state will be
> updated accordingly in the above function to reflect the disabled DC9
> state.
>
>> > ...
>> >
>> > in intel_dmc_get_dc6_allowed_count():
>> > ...
>> >         if (DISPLAY_VER(display) < 14)
>> >                 return false;
>> >  
>> > +       if (!dmc) {
>> > +               *count = 0;
>> > +               return true;
>> > +       }
>> > +
>> 
>> This seems neat but is overkill. dmc is never NULL here, but I added the
>> check for completeness.
>
> intel_dmc_get_dc6_allowed_count() shouldn't fall back on DISPLAY_VER>=14
> to report the DC6 residency in a way that only works for older
> platforms. Hence the function should return true for DISPLAY_VER>=14.
>
>> It's the intel_dmc_update_dc6_allowed_count() that's more fragile, and I
>> want that to have the !dmc check, instead of relying on the subtle
>> dependency on power_domains->dc_state.
>
> The counter tracking should depend on the power_domians->dc_state SW
> state as described above, so that's the correct thing to do there.
> dmc==NULL in intel_dmc_update_dc6_allowed_count() would be only a bug in
> the driver, if you wanted to check for that it should be a
> WARN_ON(!dmc) check.

So we still have the NULL pointer dereference at probe, albeit very
unlikely. I don't really know what to do here. Care to send a patch to
fix it?


BR,
Jani.



>
>> >         mutex_lock(&power_domains->lock);
>> > -       dc6_enabled = intel_de_read(display, DC_STATE_EN) &
>> > -                     DC_STATE_EN_UPTO_DC6;
>> > +       dc6_enabled = power_domains->dc_state & DC_STATE_EN_UPTO_DC6;
>> > ...
>> >
>> >> Anyway, I think "not oopsing" is a lot better than "inaccurate DC
>> >> counters in debugfs".
>> >
>> > Agreed, the above would ensure both.
>> >
>> >> 
>> >> BR,
>> >> Jani.
>> >> 
>> >> 
>> >> >
>> >> >> so could instead the counter update depend on the
>> >> >> driver's DC state instead of the HW state? I.e. set
>> >> >> gen9_set_dc_state()/dc6_was_enabled,
>> >> >> intel_dmc_get_dc6_allowed_count()/dc6_enable if power_domains->dc_state
>> >> >> says that DC6 was indeed enabled by the driver (instead of checking the
>> >> >> HW state).
>> >> >> 
>> >> >> That would fix the reporter's oops when calling
>> >> >> intel_dmc_update_dc6_allowed_count(start_tracking=false), by not 
>> >> >> calling
>> >> >> it if the driver hasn't actually enabled DC6 and it would also keep the
>> >> >> DC6 counter tracking correct.
>> >> >> 
>> >> >> intel_dmc_update_dc6_allowed_count(start_tracking=true) would be also
>> >> >> guaranteed to be called only once the firmware is loaded, as until that
>> >> >> point enabling DC6 is blocked (by holding a reference on the DC_off
>> >> >> power well).
>> >> >> 
>> >> >> > ---
>> >> >> >  drivers/gpu/drm/i915/display/intel_dmc.c | 6 +++---
>> >> >> >  1 file changed, 3 insertions(+), 3 deletions(-)
>> >> >> > 
>> >> >> > diff --git a/drivers/gpu/drm/i915/display/intel_dmc.c 
>> >> >> > b/drivers/gpu/drm/i915/display/intel_dmc.c
>> >> >> > index 2fb6fec6dc99..169bbbc91f6d 100644
>> >> >> > --- a/drivers/gpu/drm/i915/display/intel_dmc.c
>> >> >> > +++ b/drivers/gpu/drm/i915/display/intel_dmc.c
>> >> >> > @@ -1570,10 +1570,10 @@ void 
>> >> >> > intel_dmc_update_dc6_allowed_count(struct intel_display *display,
>> >> >> >      struct intel_dmc *dmc = display_to_dmc(display);
>> >> >> >      u32 dc5_cur_count;
>> >> >> >  
>> >> >> > -    if (DISPLAY_VER(dmc->display) < 14)
>> >> >> > +    if (!dmc || DISPLAY_VER(display) < 14)
>> >> >> >              return;
>> >> >> >  
>> >> >> > -    dc5_cur_count = intel_de_read(dmc->display, 
>> >> >> > DG1_DMC_DEBUG_DC5_COUNT);
>> >> >> > +    dc5_cur_count = intel_de_read(display, DG1_DMC_DEBUG_DC5_COUNT);
>> >> >> >  
>> >> >> >      if (!start_tracking)
>> >> >> >              dmc->dc6_allowed.count += dc5_cur_count - 
>> >> >> > dmc->dc6_allowed.dc5_start;
>> >> >> > @@ -1587,7 +1587,7 @@ static bool 
>> >> >> > intel_dmc_get_dc6_allowed_count(struct intel_display *display, u32 *
>> >> >> >      struct intel_dmc *dmc = display_to_dmc(display);
>> >> >> >      bool dc6_enabled;
>> >> >> >  
>> >> >> > -    if (DISPLAY_VER(display) < 14)
>> >> >> > +    if (!dmc || DISPLAY_VER(display) < 14)
>> >> >> >              return false;
>> >> >> >  
>> >> >> >      mutex_lock(&power_domains->lock);
>> >> >> > -- 
>> >> >> > 2.47.3
>> >> >> > 
>> >> 
>> >> -- 
>> >> Jani Nikula, Intel
>> 
>> -- 
>> Jani Nikula, Intel

-- 
Jani Nikula, Intel

Reply via email to