On Wed, 27 May 2026, Ville Syrjälä <[email protected]> wrote:
> On Wed, May 27, 2026 at 07:35:32PM +0300, Jani Nikula wrote:
>> On Wed, 27 May 2026, Ville Syrjälä <[email protected]> wrote:
>> > This stuff is not meant for runtime pm. xe just has some
>> > obnoxious hacks in its runtime pm code to allow it to call
>> > incorrect functions from its runtime pm paths without
>> > deadlocks/etc.
>> >
>> > I think the xe hacks need to be killed and runtime pm
>> > implemented there *correctly* before we base any common
>> > code on the xe implementation.
>> >
>> > We should perhaps start from the i915 implementation
>> > instead. That might help properly highlight all the
>> > bogus things that xe is doing.
>> 
>> There are a few reasons why I chose to start off with xe like this.
>> 
>> The granularity of functions are a fairly good starting point for a
>> shared implementation. Simply moving them over from xe to display in a
>> non-functional way reduces xe_display.c dependency deep into display
>> functionality. It's forward progress with no risk for regressions.
>> 
>> Sure, we could define similar functions for i915 to call, but that's
>> going to contain functional changes from about patch #1, because i915
>> calls deep into display in a very scattered way. With the approach at
>> hand, we can gradually move i915 over to the new stuff, even function by
>> function, comparing the sequences, making small changes to either along
>> the way, as the case may be.
>> 
>> From my POV the end result is going to be the same. The difference is in
>> the path we choose.
>> 
>> Of course, there's also the problem that I don't know for sure what all
>> the hacks are that you refer to, or what implementing runtime PM
>> correctly there means. The d3cold stuff (including those
>> intel_display_power_disable/enable() calls) is hidden behind a flag that
>> only gets called for xe, which I guess is a bit lame, but also isolates
>> it from the rest.
>
> The main issue is that xe calls the wrong things and thus ends up
> calling runtime_pm_get/put() from within the runtime suspend/resume
> hooks, which is completely wrong. IIRC that would normally just
> deadlock but there is some hack deep in the guts of the xe that
> skips the actual rpm stuff and just adjusts the refcount. And that
> brings along an implicit assumption that the device is still awake
> enough to actually work correctly for whatever the functions that
> takes the rpm ref needs. So it all works by accident, not by design.
> And it could very well break at any time by some innocent looking
> change to any of those functions that shouldn't be even be called.
>
> In i915, with its proper runtime pm implementation, we simply
> can't call any of those things from the runtime pm hooks. So someone
> will need to identify all those functions, come up with a proper way
> to do what needs to be done, and then nuke the xe hacks. Only after
> that we have any real chance of converting i915 to use that code.
>
> The other direction might allow us to proceed a bit further in
> the unification before the xe hacks need to be fixed, because
> anything i915 calls will be safe to also call in xe.
>
> I'm also not sure xe is even calling the right things in the right
> order for the things that it should be calling. Comparing with the
> i915 code should usually tell us that, but until the i915 bits have
> been extracted to similar functions it's probably a bit hard to see
> what the differences are.

Fair enough.

I'm dropping this series, and taking a completely different approach in
[1].


BR,
Jani.


[1] https://lore.kernel.org/r/[email protected]


-- 
Jani Nikula, Intel

Reply via email to