On 2026-05-29 at 14:49:29 GMT, Janusz Krzysztofik wrote:
> On Thu, 2026-05-28 at 17:24 +0200, Krzysztof Niemiec wrote:
> > On 2026-05-25 at 10:16:32 GMT, Janusz Krzysztofik wrote:
> > > Hi Krzysztof,
> > > 
> > > On Fri, 2026-05-22 at 17:40 +0200, Krzysztof Niemiec wrote:
> > > > The dev->registered variable was initially added in such a way that it
> > > > is only set after all functions that could have failed have been called
> > > > and haven't returned an error. However, since then, the function
> > > > drm_modeset_register_all() is being checked for its return value, which
> > > > opens a possibility of failing the register after setting the registered
> > > > variable anyway. drm_dev_register() cleans up after itself in case of
> > > > failure, mimicking the functions called in drm_dev_unregister(), with
> > > > the exception of drm_client_sysrq_unregister() and
> > > > drm_panic_unregister() (_register() counterparts of which are not
> > > > checked for their return values), and the dev->registered flag.
> > > > 
> > > > This creates a situation in which after calling drm_dev_register()
> > > > nothing is registered (as the function entered an error path and cleaned
> > > > up), but the device is reported as being registered according to the
> > > > dev->registered variable.
> > > > 
> > > > This, for example, confuses the WARN_ON() in 
> > > > drm_mode_object_register(), which
> > > > is raised if a non-dynamic mode object is attempted to be unregistered
> > > > in between the drm_dev_register() and drm_dev_unregister() calls. If
> > > > drm_dev_register() fails - meaning the device *isn't* registered -
> > > > currently it still reports dev->registered = true, which triggers the
> > > > WARN_ON(), even though the usage is correct in that instance.
> > > > 
> > > > Set dev->registered back to false in the error path to prevent this.
> > > 
> > > While you've also submitted a patch ("drm/i915: Remove 
> > > drm_dev_unregister()
> > > from the error path during i915_driver_register()") that adjusts i915
> > > handling of register / unregister steps after this fix, have you checked 
> > > if
> > > other drivers are prepared for that and won't be affected?
> > 
> > I just did; there'd be a problem if in the error path triggered by
> > failing drm_dev_register, some driver relies on the dev->registered
> > variable to be true to do its work.
> 
> OK, maybe my doubts about other drivers were groundless.  The flag is
> described as DRM internal after all, and cleaning it in the error unwind
> path seems fully justified.  However, please review your commit
> description.  Trying to understand why drm_mode_object_register() might
> call WARN_ON() when dev->registered is true, I found it doesn't, and that's
> probably its _unregister() counterpart that may do that.  With no stack
> trace attached, that's confusing for a reviewer.  Such WARN_ON()s may be
> hit not only due to inconsistencies internal to DRM, but also as a result
> of drivers incorrectly / unexpectedly calling some DRM functions.

Yes, I've made a mistake in the commit log, I meant _unregister() - see
[1]. Do you think going through how i915, specifically, hits this
WARN_ON() would be a sufficient example?

Thanks,
Krzysztof

[1] 
https://elixir.bootlin.com/linux/v7.0.10/source/drivers/gpu/drm/drm_mode_object.c#L106-L117

> 
> Thanks,
> Janusz
> 
> 
> > 
> > There are two main patterns of doing the error path in the DRM drivers
> > I've seen - one is just directly returning from the overarching function
> > with drm_dev_register()'s error return value. The other is invoking the
> > .unbind handler of the driver via component_unbind_all(), and most of
> > the time those call drm_dev_unregister() right away, which sets the
> > dev->registered value to false anyway. In these two cases (if
> > drm_dev_unregister() is called in the second one), the problem doesn't
> > affect the driver at all.
> > 
> > In the case of i915 not setting the variable to false on failure
> > triggered a WARN_ON() in drm_mode_object_unregister() because that is
> > invoked indirectly in the error path due to the complexity of our
> > init code (even xe just returns directly instead). If another driver
> > calls that function in its error path, it will just silence (correctly)
> > the WARN_ONs there.
> > 
> > The only driver that explicitly checks for dev->registerd in its error
> > path is msm, which does that to see if it needs to invoke
> > drm_dev_unregister() during unload. This patch actually fixes a bug there
> > because otherwise _unregister() would be called even if _register() fails,
> > which is not correct.
> > 
> > All other uses of the variable are outside the error path of their
> > respective drivers, so they should function as intended even after the
> > change.
> > 
> > Thanks
> > Krzysztof
> > 
> > > 
> > > Thanks,
> > > Janusz
> > > 
> > > 
> > > > 
> > > > Signed-off-by: Krzysztof Niemiec <[email protected]>
> > > > ---
> > > >  drivers/gpu/drm/drm_drv.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > index 985c283cf59f..92403d79bb64 100644
> > > > --- a/drivers/gpu/drm/drm_drv.c
> > > > +++ b/drivers/gpu/drm/drm_drv.c
> > > > @@ -1116,6 +1116,7 @@ int drm_dev_register(struct drm_device *dev, 
> > > > unsigned long flags)
> > > >         if (dev->driver->unload)
> > > >                 dev->driver->unload(dev);
> > > >  err_minors:
> > > > +       dev->registered = false;
> > > >         remove_compat_control_link(dev);
> > > >         drm_minor_unregister(dev, DRM_MINOR_ACCEL);
> > > >         drm_minor_unregister(dev, DRM_MINOR_PRIMARY);

Reply via email to