On 8/16/19 6:55 AM, Ming Lei wrote:
The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
required.
However, when mq & iosched kobjects are removed via
blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
too. This way causes AB-BA lock because the kernfs built-in lock of
'kn-count' is required inside kobject_del() too, see the lockdep warning[1].
On the other hand, it isn't necessary to acquire q->sysfs_lock for
both blk_mq_unregister_dev() & elv_unregister_queue() because
clearing REGISTERED flag prevents storing to 'queue/scheduler'
from being happened. Also sysfs write(store) is exclusive, so no
necessary to hold the lock for elv_unregister_queue() when it is
called in switching elevator path.
Fixes the issue by not holding the q->sysfs_lock for blk_mq_unregister_dev() &
elv_unregister_queue().
Have you considered to split sysfs_lock into multiple mutexes? Today it
is very hard to verify the correctness of block layer code that uses
sysfs_lock because it has not been documented anywhere what that mutex
protects. I think that mutex should be split into at least two mutexes:
one that protects switching I/O schedulers and another one that protects
hctx->tags and hctx->sched_tags.
Thanks,
Bart.