Re: Expecting to revert commit 55285e21f045 "fbdev/efifb: Release PCI device ..."

Alex Deucher Wed, 22 Dec 2021 20:07:28 -0800

On Tue, Dec 21, 2021 at 1:47 PM Deucher, Alexander
<[email protected]> wrote:
>
> [Public]
>
> > -----Original Message-----
> > From: Deucher, Alexander
> > Sent: Tuesday, December 21, 2021 12:01 PM
> > To: Linus Torvalds <[email protected]>; Imre Deak
> > <[email protected]>; [email protected]
> > Cc: Daniel Vetter <[email protected]>; Kai-Heng Feng
> > <[email protected]>
> > Subject: RE: Expecting to revert commit 55285e21f045 "fbdev/efifb: Release
> > PCI device ..."
> >
> > [Public]
> >
> > > -----Original Message-----
> > > From: Linus Torvalds <[email protected]>
> > > Sent: Monday, December 20, 2021 5:05 PM
> > > To: Imre Deak <[email protected]>
> > > Cc: Daniel Vetter <[email protected]>; Deucher, Alexander
> > > <[email protected]>; Kai-Heng Feng
> > > <[email protected]>
> > > Subject: Re: Expecting to revert commit 55285e21f045 "fbdev/efifb:
> > > Release PCI device ..."
> > >
> > > On Mon, Dec 20, 2021 at 1:33 PM Imre Deak <[email protected]>
> > wrote:
> > > >
> > > > amdgpu.runpm=0
> > >
> > > Hmmm.
> > >
> > > This does seem to "work", but not very well.
> > >
> > > With this, what seems to happen is odd: I lock the screen, wait, it
> > > goes "No signal, shutting down", but then doesn't actually shut down
> > > but stays black (with the backlight on). After _another_ five seconds
> > > or so, the monitor goes "No signal, shutting down" _again_, and at that
> > point it actually does it.
> > >
> > > So it solves my immediate problem - in that yes, the backlight finally
> > > does turn off in the end - but it does seem to be still broken.
> > >
> > > I'm very surprised if no AMD drm developers can see this exact same thing.
> > > This is a very simple setup. The only possibly slightly less common
> > > thing is that I have two monitors, but while that is not necessarily
> > > the _most_ common setup in an absolute sense, I'd expect it to be very
> > > common among DRM developers..
> > >
> > > I guess I can just change the revert to just a
> > >
> > >     -int amdgpu_runtime_pm = -1;
> > >     +int amdgpu_runtime_pm = 0;
> > >
> > > instead. The auto-detect is apparently broken. Maybe it should only
> > > kick in for LVDS screens on actual laptops?
> > >
> > > Note: on my machine, I get that
> > >
> > >    amdgpu 0000:49:00.0: amdgpu: Using BACO for runtime pm
> > >
> > > so maybe the other possible runtime pm models (ARPX and BOCO) are ok,
> > > and it's only that BACO case that is broken.
> > >
> > > I have no idea what any of those three things are - I'm just looking
> > > at the uses of that amdgpu_runtime_pm variable.
> > >
> > > amdgpu people: if you don't want that amdgpu_runtime_pm turned off by
> > > default, tell me something else to try.
> >
> > For a little background, runtime PM support was added about 10 year ago
> > originally to support laptops with multiple GPUs (integrated and discrete).
> > It's not specific to the display hardware.  When the GPU is idle, it can be
> > powered down completely.  In the case of these laptops, it's D3 cold
> > (managed by ACPI, we call this BOCO in AMD parlance - Bus Off, Chip Off)
> > which powers off the dGPU completely (i.e., it disappears from the bus).  A
> > few years ago we extended this to support desktop dGPUs as well which
> > support their own version of runtime D3 (called BACO in AMD parlance - Bus
> > Active, Chip Off).  The driver can put the chip into a low power state where
> > everything except the bus interface is powered down (to avoid the device
> > disappearing from the bus).  So this has worked for almost 2 years now on
> > BACO capable parts and for a decade or more on BOCO systems.
> > Unfortunately, changing the default runpm parameter setting would cause a
> > flood of bug reports about runtime power management breaking and
> > suddenly systems are using more power.
> >
> > Imre's commit (55285e21f045) fixes another commit (a6c0fd3d5a8b).
> > Runtime pm was working on amdgpu prior to that commit.  Is it possible
> > there is still some race between when amdgpu takes over from efifb?  Does
> > it work properly when all pm_runtime calls in efifb are removed or if efifb 
> > is
> > not enabled?  Runtime pm for Polaris boards has been enabled by default
> > since 4fdda2e66de0b which predates both of those patches.
>
> Thinking about this more, I wonder if there was some change in some userspace 
> component which was hidden by the changes in 55285e21f045 and a6c0fd3d5a8b.  
> E.g., some desktop component started polling for display changes or GPU 
> temperature or something like that and when a6c0fd3d5a8b was in place the GPU 
> never entered runtime suspend.  Then when 55285e21f045 was applied, it 
> unmasked the new behavior in the userpace component.
>
> What should happen is that when all of the displays blank, assuming the GPU 
> is otherwise idle, the GPU will runtime suspend after  seconds.  When you 
> move the mouse or hit the keyboard, that should trigger the GPU should 
> runtime resume and then the displays will be re-enabled.
>
> In the behavior you are seeing, when the displays come back on after they 
> blank are you seeing the device resume from runtime suspend?  On resume from 
> suspend (runtime or system) we issue a hotplug notification to userspace in 
> case any displays changed during suspend when the GPU was powered down (and 
> hence could not detect a hotplug event).  Perhaps that event is triggering 
> userspace to reprobe and re-enable the displays shortly after resume from 
> runtime suspend due to some other event that caused the device to runtime 
> resume.  Does something like this help by any chance?
> https://bugzilla.kernel.org/attachment.cgi?id=300103&action=diff&collapsed=&headers=1&format=raw


I've been working with the reporters of the following bug reports.
Probably best to follow the progress there:
https://bugzilla.kernel.org/show_bug.cgi?id=215203
https://gitlab.freedesktop.org/drm/amd/-/issues/1840

Alex

Re: Expecting to revert commit 55285e21f045 "fbdev/efifb: Release PCI device ..."

Reply via email to