Danilo Krummrich wrote:
> On Thu Jan 29, 2026 at 4:42 AM CET, dan.j.williams wrote:
> > Jason Gunthorpe wrote:
> >> On Thu, Jan 29, 2026 at 03:08:22AM +0200, Laurent Pinchart wrote:
> >> > > The latter already have robust schemes to help the driver shutdown and
> >> > > end the concurrent operations. ie cancel_work_sync(),
> >> > > del_timer_sync(), free_irq(), and *notifier_unregister().
> >> > 
> >> > One a side note, devm_request_irq() is another of the devm_* helpers
> >> > that cause race conditions, as interrupt handlers can run right after
> >> > .remove() returns, which drivers will most likely not handle correctly.
> >> 
> >> Yes! You *cannot* intermix devm and non-devm approaches without
> >> creating very subtle bugs exactly like this. If your subsystem does
> >> not provide a "devm register" helper its drivers shouldn't use devm.
> >
> > I wonder if we should have a proactive debug mode that checks for
> > idiomatic devres usage and flags:
> >
> > - registering devres actions while the driver is detached
> 
> In Rust we already enforce this through the type system, i.e. we even fail to
> compile the code when this is violated. (I.e. for all scopes that guarantee 
> that
> a device is bound (and will remain throughout) we give out a &Device<Bound>,
> which serves as a cookie.)
> 
> In C I don't really see how that would be possible without additional locking,
> as the only thing we could check is dev->driver, which obviously is racy.
> 
> > - registering devres actions for a device with a driver that has a
> >   .remove() method
> 
> This is perfectly valid, drivers might still be performing teardown operations
> on the device (if it did not get hot unplugged).

Yes, this one is a soft warning. It is perfectly valid, but the first
thing I look for in a device that uses devm in ->probe() and also has a
->remove() method is dependencies of the devm object on the ->remove()
managed object. That case is potentially mitigated if all resources are
devm acquired and no ->remove() is needed.

> > - passing a devres allocation to a kobject API
> 
> Well, this is an ownership violation. Technically, devres owns this allocation
> and devres releases the allocation when the device is unbound. Which makes 
> this
> allocation only ever valid to access from a scope that is guaranteed that the
> device is (and remains) bound.

To be clear I am talking about:

dev2 = devm_kzalloc(dev1...)
init(dev2)
device_register(dev2)

...where it must be valid past unbind of dev1.

> > - invoking devres release actions from a kobject release API
> 
> This is similar to "registering devres actions while the driver is detached" 
> and
> falls into the boarder problem class of "messing with devres objects outside 
> of
> a Device<Bound> scope".
> 
> Again, I think in C we can't properly protect against this. While we could 
> also
> give out a "Device<Bound>" token for scopes where we can guarantee that the
> device is actually bound to a driver in C, we can't control the lifetime of 
> the
> token as we can in Rust, which makes it pointless.

This is why Rust remains on my, learn when I have time, pile. The goal
would not be to "properly protect", but "sufficiently warn" the
unsuspecting. If anything is going to get me to convert a subsystem to
Rust it is to get help from the compiler for the review burden of the
above abuses.

> So, the best thing we can probably do is document that "This must only be 
> called
> from a scope that guarantees that the device remains bound throughout." for 
> all
> the devres accessors.

Agree.

> There may be one thing that comes to my mind that we could do though. While we
> can't catch those at compile time, we might be able to catch those on runtime.
> 
> For instance, we could "abuse" lockdep and register a fake lock for a
> Device<Bound> scope and put a lockdep assert into all the devres accessors.
> 
> However, fixing up all the violations that come up when introducing this 
> sounds
> like a huge pain. :)

Right, and as you said there are many valid uses that are not typically
recommended that would make the warning not useful.

Reply via email to