On Thu, Jan 29, 2026 at 10:11:46AM +0100, Bartosz Golaszewski wrote: > On Wed, Jan 28, 2026 at 4:48 PM Johan Hovold <[email protected]> wrote: > > On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: > > > On Mon, Jan 26, 2026 at 2:50 PM Johan Hovold <[email protected]> wrote: > > > > > > It's certainly possible to handle the chardev unplug issue without > > > > revocable as several subsystems already do. All you need is a refcount, > > > > a lock and a flag. > > > > > > > > It may be possible to provide a generic solutions at the chardev level > > > > or some kind of helper implementation (similar to revocable) for > > > > subsystems to use directly. > > > > > > This echoes the heated exchange I recently had with Johan elsewhere so > > > I would like to chime in and use the wider forum of driver core > > > maintainers to settle an important question. It seems there are two > > > camps in this discussion: one whose perception of the problem is > > > limited to character devices being referenced from user-space at the > > > time of the driver unbind (favoring fixing the issues at the vfs > > > level) and another extending the problem to any driver unbinding where > > > we cannot ensure a proper ordering of the teardown (for whatever > > > reason: fw_devlink=off, helper auxiliary devices acting as > > > intermediates, or even user-space unbinding a driver manually with > > > bus-level sysfs attributes) leaving consumers of resources exposed by > > > providers that are gone with dangling references (focusing the > > > solutions on the subsystem level). > > > > What I've been trying to get across is that the chardev hot-unplug issue > > is real and needs to be fixed where it still exists, while the manual > > unbinding of drivers by root is a corner case which does not need to be > > addressed at *any* cost. > > > > If addressing the latter by wrapping every resource access in code that > > adds enough runtime overhead and makes drivers harder to write and > > maintain it *may* not be worth it and we should instead explore > > alternatives. > > Alright, so we *do* agree at least on some parts. :) > > I agree that any such change should not affect drivers. If you look at > the GPIO changes I did or the proposed nvmem rework - it never touched > drivers, only the subsystem level code. The latter especially is > really tiny, in fact: > > drivers/nvmem/core.c | 172 +++++++++++++++++++++++--------------- > drivers/nvmem/internals.h | 17 +++- > > is all you need to make it not crash in the situations I described > under that series. Runtime overhead in read-sections with SRCU or > read-write semaphores is negligible and typically we only have to > write on driver unbind. So that "wrapping every resource access" > sounds scary but really is not. > > GPIO work was bigger but it addressed way more synchronization issues > than just supplier unbinding. > > For I2C both the problem is different (subsystem waiting forever for > consumers to release all references) and the culprit: memory used to > hold the reference-counted struct device is released the supplier > unbind unconditionally. Unfortunately there's no way around it other > than to first move it into a separate chunk managed by i2c core.
Isn't there ? Can't the driver-specific data structure be reference-counted instead of unconditionally freed at unbind time ? > But > that's not the synchronization part that leaks into the drivers, just > the need to move struct device out of struct i2c_adapter. > > > This may involve tracking consumers like fw_devlink already does today > > so that they are unbound before their dependencies are. > > During Saravana's talk at LPC we did briefly speak about whether it > would be possible to enforce devlinks for ALL devices linked in a > consumer-supplier fashion. I did in fact look into it for a bit on my > way back and it too would require at least subsystem-level changes > across all subsystems because you need to add that entry point at the > time of the resource being requested so it's not a no-cost operation. > But it is an alternative, yes though it'll require a comparable amount > of gap-plugging IMO. I recall at least one driver (omap3isp) having a circular resource issue. The ISP hardware block has the ability to produce a clock for the camera sensor, and the camera sensor is a resource acquired by the ISP driver. It's quite rare, but it happens. I would however not reject a solution that would solve the 99.99% of the problem without addressing this. > > Because in the end, how sound is a model where we allow critical > > resources to silently go away while a device is still in use (e.g. you > > won't discover that your emergency shutdown gpio is gone until you > > actually need it)? > > Well, we do allow it at the moment. It doesn't seem like devlink will > be able to cover 100% of use-cases anytime soon. We have this issue because designing resource management is hard. The decision we made not to pay that cost has now turned into a huge technical debt. There's no easy way around it, it won't be easier to solve it correctly today than it was years ago. I don't know when we will be able to fix the issue, but I know it will happen only when we decide to face the situation and stop with band-aids. What I think is the biggest issue at the moment is the lack of motivation/time/money to address this huge, but I'm hopeful because I trust the technical expertise of the community. -- Regards, Laurent Pinchart
