On Fri, May 10, 2024 at 03:11:13PM +0200, Jonas Ådahl wrote:
> On Fri, May 10, 2024 at 02:45:48PM +0200, Thomas Zimmermann wrote:
> > Hi
> > 
> > > (This was discussed on #dri-devel, but I'll reiterate here as well).
> > > 
> > > There are two problems at hand; one is the race condition during boot
> > > when the login screen (or whatever display server appears first) is
> > > launched with simpledrm, only some moments later having the real GPU
> > > driver appear.
> > > 
> > > The other is general purpose GPU hotplugging, including the unplugging
> > > the GPU decided by the compositor to be the primary one.
> > 
> > The situation of booting with simpledrm (problem 2) is a special case of
> > problem 1. From the kernel's perspective, unloading simpledrm is the same as
> > what you call general purpose GPU hotplugging. Even through there is not a
> > full GPU, but a trivial scanout buffer. In userspace, you see the same
> > sequence of events as in the general case.
> 
> Sure, in a way it is, but the consequence and frequency of occurence is
> quite different, so I think it makes sense to think of them as different
> problems, since they need different solutions. One is about fixing
> userspace components support for arbitrary hotplugging, the other for
> mitigating the race condition that caused this discussion to begin with.

We're trying to document the hotunplug consensus here:

https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#device-hot-unplug

And yes hotunplug is really rough on userspace, but if that doesn't work,
we need to discuss what should be done instead in general. I agree with
Thomas that simpledrm really isn't special in that regard.

> > > The latter is something that should be handled in userspace, by
> > > compositors, etc, I agree.
> > > 
> > > The former, however, is not properly solved by userspace learning how to
> > > deal with primary GPU unplugging and switching to using a real GPU
> > > driver, as it'd break the booting and login experience.
> > > 
> > > When it works, i.e. the race condition is not hit, is this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with the real GPU driver
> > >   * The login screen interface is smoothly animating using hardware
> > >     accelerating, presenting "advanced" graphical content depending on
> > >     hardware capabilities (e.g. high color bit depth, HDR, and so on)
> > > 
> > > If the race condition is hit, with a compositor supporting primary GPU
> > > hotplugging, it'll work like this:
> > > 
> > >   * System boots
> > >   * Plymouth shows a "splash" screen
> > >   * The login screen display server is launched with simpledrm
> > >   * Due to using simpldrm, the login screen interface is not animated and
> > >     just plops up, and no "advanced" graphical content is enabled due to
> > >     apparent missing hardware capabilities
> > >   * The real GPU driver appears, the login screen now starts to become
> > >     animated, and may suddenly change appearance due to capabilties
> > >     having changed
> > > 
> > > Thus, by just supporting hotplugging the primary GPU in userspace, we'll
> > > still end up with a glitchy boot experience, and it forces userspace to
> > > add things like sleep(10) to work around this.
> > > 
> > > In other words, fixing userspace is *not* a correct solution to the
> > > problem, it's a work around (albeit a behaivor we want for other
> > > reasons) for the race condition.
> > 
> > To really fix the flickering, you need to read the old DRM device's atomic
> > state and apply it to the new device. Then tell the desktop and applications
> > to re-init their rendering stack.
> > 
> > Depending on the DRM driver and its hardware, it might be possible to do
> > this without flickering. The key is to not loose the original scanout
> > buffer, while not probing the new device driver. But that needs work in each
> > individual DRM driver.
> 
> This doesn't sound like it'll fix any flickering as I describe them.
> First, the loss of initial animation when the login interface appears is
> not something one can "fix", since it has already happened.
> 
> Avoiding flickering when switching to the new driver is only possible
> if one limits oneself to what simpledrm was capable of doing, i.e. no
> HDR signaling etc.

As long as you use the atomic ioctls (I think at least) and the real
driver has full atomic state takeover support (only i915 to my knowledge),
and your userspace doesn't unecessarily mess with the display state when
it takes over a new driver, then that should lead to flicker free boot
even across a simpledrm->real driver takeover.

If your userspace doesn't crash&burn ofc :-)

But it's a real steep ask of all components to get this right.

> > > Arguably, the only place a more educated guess about whether to wait or
> > > not, and if so how long, is the kernel.
> > 
> > As I said before, driver modules come and go and hardware devices come and
> > go.
> > 
> > To detect if there might be a native driver waiting to be loaded, you can
> > test for
> > 
> > - 'nomodeset' on the command line -> no native driver
> 
> Makes sense to not wait here, and just assume simpledrm forever.
> 
> > - 'systemd-load-modules' not started -> maybe wait
> > - look for drivers under /lib/modules/<version>/kernel/drivers/gpu/drm/ ->
> > maybe wait
> 
> I suspect this is not useful for general purpose distributions. I have
> 43 kernel GPU modules there, on a F40 installation.
> 
> > - maybe udev can tell you more
> > - it might for detection help that recently simpledrm devices refer to their
> > parent PCI device
> > - maybe systemd tracks the probed devices
> 
> If the kernel already plumbs enough state so userspace components can
> make a decent decision, instead of just sleeping for an arbitrary amount
> of time, then great. This is to some degree what
> https://github.com/systemd/systemd/issues/32509 is about.

I think you can't avoid the timeout entirely for the use-case where the
user has disable the real driver by not compiling it, and simpledrm would
be the only driver you'll ever get.

But that's just not going to happen on any default distro setup, so I
think it's ok if it sucks a bit.

Cheers, Sima

> 
> 
> Jonas
> 
> > 
> > Best regards
> > Thomas
> > 
> > > 
> > > 
> > > Jonas
> > > 
> > > > The next best solution is to keep the final DRM device open until a new 
> > > > one
> > > > shows up. All DRM graphics drivers with hotplugging support are 
> > > > required to
> > > > accept commands after their hardware has been unplugged. They simply 
> > > > won't
> > > > display anything.
> > > > 
> > > > Best regards
> > > > Thomas
> > > > 
> > > > 
> > > > > Thanks
> > > > > 
> > > > -- 
> > > > --
> > > > Thomas Zimmermann
> > > > Graphics Driver Developer
> > > > SUSE Software Solutions Germany GmbH
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany
> > > > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> > > > HRB 36809 (AG Nuernberg)
> > > > 
> > 
> > -- 
> > --
> > Thomas Zimmermann
> > Graphics Driver Developer
> > SUSE Software Solutions Germany GmbH
> > Frankenstrasse 146, 90461 Nuernberg, Germany
> > GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
> > HRB 36809 (AG Nuernberg)
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Reply via email to