On Wed, 27 May 2026, Ville Syrjälä <[email protected]> wrote: > On Wed, May 27, 2026 at 07:35:32PM +0300, Jani Nikula wrote: >> On Wed, 27 May 2026, Ville Syrjälä <[email protected]> wrote: >> > This stuff is not meant for runtime pm. xe just has some >> > obnoxious hacks in its runtime pm code to allow it to call >> > incorrect functions from its runtime pm paths without >> > deadlocks/etc. >> > >> > I think the xe hacks need to be killed and runtime pm >> > implemented there *correctly* before we base any common >> > code on the xe implementation. >> > >> > We should perhaps start from the i915 implementation >> > instead. That might help properly highlight all the >> > bogus things that xe is doing. >> >> There are a few reasons why I chose to start off with xe like this. >> >> The granularity of functions are a fairly good starting point for a >> shared implementation. Simply moving them over from xe to display in a >> non-functional way reduces xe_display.c dependency deep into display >> functionality. It's forward progress with no risk for regressions. >> >> Sure, we could define similar functions for i915 to call, but that's >> going to contain functional changes from about patch #1, because i915 >> calls deep into display in a very scattered way. With the approach at >> hand, we can gradually move i915 over to the new stuff, even function by >> function, comparing the sequences, making small changes to either along >> the way, as the case may be. >> >> From my POV the end result is going to be the same. The difference is in >> the path we choose. >> >> Of course, there's also the problem that I don't know for sure what all >> the hacks are that you refer to, or what implementing runtime PM >> correctly there means. The d3cold stuff (including those >> intel_display_power_disable/enable() calls) is hidden behind a flag that >> only gets called for xe, which I guess is a bit lame, but also isolates >> it from the rest. > > The main issue is that xe calls the wrong things and thus ends up > calling runtime_pm_get/put() from within the runtime suspend/resume > hooks, which is completely wrong. IIRC that would normally just > deadlock but there is some hack deep in the guts of the xe that > skips the actual rpm stuff and just adjusts the refcount. And that > brings along an implicit assumption that the device is still awake > enough to actually work correctly for whatever the functions that > takes the rpm ref needs. So it all works by accident, not by design. > And it could very well break at any time by some innocent looking > change to any of those functions that shouldn't be even be called. > > In i915, with its proper runtime pm implementation, we simply > can't call any of those things from the runtime pm hooks. So someone > will need to identify all those functions, come up with a proper way > to do what needs to be done, and then nuke the xe hacks. Only after > that we have any real chance of converting i915 to use that code. > > The other direction might allow us to proceed a bit further in > the unification before the xe hacks need to be fixed, because > anything i915 calls will be safe to also call in xe. > > I'm also not sure xe is even calling the right things in the right > order for the things that it should be calling. Comparing with the > i915 code should usually tell us that, but until the i915 bits have > been extracted to similar functions it's probably a bit hard to see > what the differences are.
Fair enough. I'm dropping this series, and taking a completely different approach in [1]. BR, Jani. [1] https://lore.kernel.org/r/[email protected] -- Jani Nikula, Intel
