On Tue, Jan 27, 2026 at 10:18:27PM +0100, Bartosz Golaszewski wrote: > maintainers to settle an important question. It seems there are two > camps in this discussion: one whose perception of the problem is > limited to character devices being referenced from user-space at the > time of the driver unbind (favoring fixing the issues at the vfs > level) and another extending the problem to any driver unbinding where > we cannot ensure a proper ordering of the teardown (for whatever > reason:
I don't think you can defend any position where the user can do echo XYZ > /sys/.../YY and the kernel has an oops. Meaing in this discussion if the user does echo ".." > /sys/bus/XX/drivers/YY/unbind The kernel shouldn't oops or warn. I've seen many kinds of bogus arguments over the years (especially misunderstanding what the module refcount does!!), but ultimately I think this is the generally agreed expectation. However, in practice this isn't a common work flow, it is quite alot of tricky work to understand how a subsystem works and put in the necessary protections, and frankly many subsystems have had these bugs for their entire existance. It isn't urgent. Several subsystems do get it right, it is very possible and the best practices are much more aligned with the Device<Bound> stuff in Rust. ie guarantee in most contexts that remove() can't run. I'm not surprised to hear pushback on trying to fix it, especially if the proposed fixes are not subsystem comphrenesive in nature. Sprinkling SRCU around as partial patches, especially in drivers, is not a good idea, IMHO. The reason cdev keeps coming up is because there are few common ways a typical driver can actually generate concurrent operations during and after remove that would be problematic. File descriptors, subsystem callbacks, work queues, timers, interrupts, and notifiers. The latter already have robust schemes to help the driver shutdown and end the concurrent operations. ie cancel_work_sync(), del_timer_sync(), free_irq(), and *notifier_unregister(). Many wrappered file descriptors are safe. For example the sysfs usage in a driver is sync stopped during device_del's calls to sysfs remove functions. IMHO the largest systemic issue in this space is people making their own fops without understanding the lifecycle model and without hand-rolling a special a "_sync" kind of shutdown around it. A managed fops with a sync destruction operation would go a long way to closing these issues. ie the gpiolib example was basically all fops, one work and a bunch of places where the protection was redundant. Yes there are other cases, and certainly I've commonly seen cases of drivers reaching into other drivers, and subsystem non-file ops, but these cases usualy have other more fundamental issues with remove races :( So I would probably also take a strong position that introducing something like DevRes where you try to wrapper MMIO or other device resources is adamently not something we want to do. Not because I don't care about these removal races, but because I want the drivers to run in a Device<Bound> context with very few exceptions. Jason
