Hi Krzysztof,
On Wed, Feb 12, 2025 at 12:50:17PM +0100, Krzysztof Niemiec wrote:
> On 2025-02-10 at 14:01:19 GMT, Andi Shyti wrote:
> > On Thu, Feb 06, 2025 at 07:07:38PM +0100, Janusz Krzysztofik wrote:
> > > We return immediately from i915_driver_register() if drm_dev_register()
> > > fails, skipping remaining registration steps. However, the _unregister()
> > > counterpart called at device remove knows nothing about that skip and
> > > executes reverts for all those steps. For that to work correctly, those
> > > revert functions must be resistant to being called even on uninitialized
> > > objects, or we must not skip their initialization.
> > >
> > > Three cases have been identified and fixes proposed. Call traces are
> > > taken from CI results of igt@i915_driver_load@reload-with-fault-injection
> > > execution, reported to several separate Gitlab issues (links provided).
> > >
> > > Immediate return was introduced to i915_driver_register() by commit
> > > ec3e00b4ee27 ("drm/i915: stop registering if drm_dev_register() fails"),
> > > however, quite a few things have changed since then. That's why I haven't
> > > mentioned it in a Fixes: tag to avoid it being picked up by stable, which
> > > I haven't tested.
> >
> > I'm not fully convinced about this series as I think that you are
> > fixing a subset of what needs to be handled properly. What about
> > hwmon? What about gt? what about debugfs?
> >
> > In my opinion we need to check in _unregister whether the
> > drm_dev_register has succeded and one way would be, e.g., to
> > check for the drm minor value, or even set the drm device tu NULL
> > (first things that come to my mind, maybe there are smarter ways
> > of doing it). This way we could skip some of the _unregister()
> > steps.
> >
>
> Is there any situation in which having the driver loaded after failing
> drm_dev_register() is of any use? Because if not, instead of recording
> the fact of registration failure, we can just stop the driver from
> loading altogether by checking drm_dev_register()'s return value,
> then calling _only_ the required _unregister steps as cleanup in an
> error path, and propagating the result up to driver probe. This way we
> don't need to store any additional information at all.
as long as the driver works, why pushing it to fail? Janusz's
patch is really showing the case.
Andi