This crash occurred while writing 1 to /sys/block/sda/device/delete at
the same instant that another process was closing the block device:
BUG: unable to handle kernel NULL pointer dereference at 00000230
IP: [<c138fa9c>] blk_get_backing_dev_info+0xc/0x20
Oops: 0000 [#1] PREEMPT SMP
Call Trace:
[<c112da2a>] ? __filemap_fdatawrite_range+0x15a/0x180
[<c112d9b5>] ? __filemap_fdatawrite_range+0xe5/0x180
[<c112dae8>] filemap_write_and_wait+0x38/0x70
[<c11b79b1>] fsync_bdev+0x41/0x50
[<c13a4f7c>] invalidate_partition+0x1c/0x40
[<c13a5d0f>] del_gendisk+0xcf/0x1c0
[<c15c7143>] sd_remove+0x53/0xb0
[<c157eaf0>] __device_release_driver+0x80/0x120
[<c157ebad>] device_release_driver+0x1d/0x30
[<c157e392>] bus_remove_device+0xb2/0xf0
[<c157b45c>] device_del+0xec/0x1e0
[<c13b6d88>] ? kobject_put+0x58/0xc0
[<c15c12af>] __scsi_remove_device+0xaf/0xc0
[<c15c12df>] scsi_remove_device+0x1f/0x30
[<c15c131b>] sdev_store_delete+0x2b/0x40
[<c15c12f0>] ? scsi_remove_device+0x30/0x30
[<c157a87f>] dev_attr_store+0x1f/0x40
...
[<c11829bc>] SyS_write+0x4c/0xb0
EIP: [<c138fa9c>] blk_get_backing_dev_info+0xc/0x20 SS:ESP 0068:f5eb9d18
It is caused by this race: Between the time Thread B's instance of
filemap_write_and_wait() has asked whether there are any pages to flush and
when it it dereferences bdev->disk, Thread A can clear that pointer in
__blkdev_put().
Thread A: Thread B:
blkdev_close() sdev_store_delete()
blkdev_put() sd_remove()
__blkdev_put() del_gendisk()
mutex_lock(bd_mutex); invalidate_partition()
sync_blkdev() fsync_bdev()
filemap_write_and_wait() filemap_write_and_wait()
if (mapping has pages) if (mapping has pages)
deref bdev->disk (OK)
Set bdev->bd_disk = NULL;
mutex_unlock(bd_mutex); deref. bdev->bd_disk (BOOM!)
The "dereference bdev->disk" occurs on this sub-chain:
filemap_write_and_wait()
__filemap_fdatawrite_range()
mapping_cap_writeback_dirty()
inode_to_bdi()
bdev_get_queue()
return bdev->disk->queue;
The problem was introduced by de1414a654e6 ("fs: export inode_to_bdi and use
it in favor of mapping->backing_dev_info"). Before that change,
mapping_cap_writeback_dirty() directly retrieved the backing_dev_info from
the mapping rather than looking it up through
mapping->host->inode_dev->bdev->bd_disk->queue.
This was found while running a stress test on an ARM-based embedded system
which involved repeatedly shutting down many services simultaneously via
systemd isolate (thereby making it likely that "Thread B" was preempted for
awhile just before it dereferenced bdev->bd_disk). I subsequently reproduced
this on vanilla Linux 4.6 in QEMU/x86.
This patch fixes the race by making sd_remove() hold bd_mutex during the
call to del_gendisk().
Fixes: de1414a654e6 ("fs: export inode_to_bdi and use it in favor of
mapping->backing_dev_info")
Signed-off-by: Howard Cochran <[email protected]>
Cc: Howard Cochran <[email protected]>
Cc: [email protected]
Cc: Christoph Hellwig <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Martin K. Petersen <[email protected]>
---
drivers/scsi/sd.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index f52b74c..0f53925 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3126,6 +3126,7 @@ static int sd_remove(struct device *dev)
{
struct scsi_disk *sdkp;
dev_t devt;
+ struct block_device *bdev;
sdkp = dev_get_drvdata(dev);
devt = disk_devt(sdkp->disk);
@@ -3134,7 +3135,13 @@ static int sd_remove(struct device *dev)
async_synchronize_full_domain(&scsi_sd_pm_domain);
async_synchronize_full_domain(&scsi_sd_probe_domain);
device_del(&sdkp->dev);
+
+ bdev = bdget_disk(sdkp->disk, 0);
+ mutex_lock(&bdev->bd_mutex);
del_gendisk(sdkp->disk);
+ mutex_unlock(&bdev->bd_mutex);
+ bdput(bdev);
+
sd_shutdown(dev);
blk_register_region(devt, SD_MINORS, NULL,
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html