On Wed, Jan 28, 2026 at 4:48 PM Johan Hovold <[email protected]> wrote: > > On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: > > On Mon, Jan 26, 2026 at 2:50 PM Johan Hovold <[email protected]> wrote: > > > > It's certainly possible to handle the chardev unplug issue without > > > revocable as several subsystems already do. All you need is a refcount, > > > a lock and a flag. > > > > > > It may be possible to provide a generic solutions at the chardev level > > > or some kind of helper implementation (similar to revocable) for > > > subsystems to use directly. > > > > This echoes the heated exchange I recently had with Johan elsewhere so > > I would like to chime in and use the wider forum of driver core > > maintainers to settle an important question. It seems there are two > > camps in this discussion: one whose perception of the problem is > > limited to character devices being referenced from user-space at the > > time of the driver unbind (favoring fixing the issues at the vfs > > level) and another extending the problem to any driver unbinding where > > we cannot ensure a proper ordering of the teardown (for whatever > > reason: fw_devlink=off, helper auxiliary devices acting as > > intermediates, or even user-space unbinding a driver manually with > > bus-level sysfs attributes) leaving consumers of resources exposed by > > providers that are gone with dangling references (focusing the > > solutions on the subsystem level). > > What I've been trying to get across is that the chardev hot-unplug issue > is real and needs to be fixed where it still exists, while the manual > unbinding of drivers by root is a corner case which does not need to be > addressed at *any* cost. > > If addressing the latter by wrapping every resource access in code that > adds enough runtime overhead and makes drivers harder to write and > maintain it *may* not be worth it and we should instead explore > alternatives. >
Alright, so we *do* agree at least on some parts. :) I agree that any such change should not affect drivers. If you look at the GPIO changes I did or the proposed nvmem rework - it never touched drivers, only the subsystem level code. The latter especially is really tiny, in fact: drivers/nvmem/core.c | 172 +++++++++++++++++++++++--------------- drivers/nvmem/internals.h | 17 +++- is all you need to make it not crash in the situations I described under that series. Runtime overhead in read-sections with SRCU or read-write semaphores is negligible and typically we only have to write on driver unbind. So that "wrapping every resource access" sounds scary but really is not. GPIO work was bigger but it addressed way more synchronization issues than just supplier unbinding. For I2C both the problem is different (subsystem waiting forever for consumers to release all references) and the culprit: memory used to hold the reference-counted struct device is released the supplier unbind unconditionally. Unfortunately there's no way around it other than to first move it into a separate chunk managed by i2c core. But that's not the synchronization part that leaks into the drivers, just the need to move struct device out of struct i2c_adapter. > This may involve tracking consumers like fw_devlink already does today > so that they are unbound before their dependencies are. > During Saravana's talk at LPC we did briefly speak about whether it would be possible to enforce devlinks for ALL devices linked in a consumer-supplier fashion. I did in fact look into it for a bit on my way back and it too would require at least subsystem-level changes across all subsystems because you need to add that entry point at the time of the resource being requested so it's not a no-cost operation. But it is an alternative, yes though it'll require a comparable amount of gap-plugging IMO. > Because in the end, how sound is a model where we allow critical > resources to silently go away while a device is still in use (e.g. you > won't discover that your emergency shutdown gpio is gone until you > actually need it)? > Well, we do allow it at the moment. It doesn't seem like devlink will be able to cover 100% of use-cases anytime soon. Bartosz
