I/O Error Test 6 (for the Disco kernel)
================
commit: 'Revert "bcache: set CACHE_SET_IO_DISABLE in
bch_cached_dev_error()"'
Problem: if one backing device hits I/O errors the cache device
is disabled, but if that cache device is shared by other bcache
devices they stop too (even with non-failing backing devices).
Original kernel: all bcache devices that share cache device with
failing backing device are stopped.
Modified kernel: only the bcache device with the failing backing
device is stopped.
Original kernel:
---------------
root@bionic-bcache:~# uname -rv
5.0.0-21-generic #22-Ubuntu SMP Tue Jul 2 13:27:33 UTC 2019
root@bionic-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[ 23.323929] bcache: register_bdev() registered backing device dm-0
[ 23.330821] bcache: register_bdev() registered backing device dm-1
[ 23.335493] bcache: run_cache_set() invalidating existing data
[ 23.347255] bcache: register_cache() registered cache device dm-2
[ 24.335738] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set
f816e09d-f744-4fc9-b3bd-239f3d5093c6
[ 24.342388] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set
f816e09d-f744-4fc9-b3bd-239f3d5093c6
root@bionic-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
# echo writeback | tee /sys/block/bcache*/bcache/cache_mode
writeback
# echo always | tee /sys/block/bcache*/bcache/stop_when_cache_set_failed
always
root@bionic-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad
[ 58.915344] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 58.921948] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 58.928886] bcache: register_bcache() error /dev/dm-0: device already
registered (emitting change event)
[ 58.931006] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 58.936386] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 58.939346] Buffer I/O error on dev bcache0, logical block 262112, async
page read
root@bionic-bcache:~# [ 58.944685] bcache: bch_count_backing_io_errors()
dm-0: IO error on backing device, unrecoverable
[ 58.948468] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 58.951078] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 58.954231] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 58.957216] Buffer I/O error on dev bcache0, logical block 1, async page read
# ./dm_fake_dev.sh /dev/loop0 bad
[ 167.341298] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 167.347802] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 167.354959] bcache: register_bcache() error /dev/dm-0: device already
registered (emitting change event)
[ 167.356585] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 167.364784] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 167.369083] Buffer I/O error on dev bcache0, logical block 262112, async
page read
root@bionic-bcache:~# [ 167.376976] bcache: bch_count_backing_io_errors()
dm-0: IO error on backing device, unrecoverable
[ 167.381644] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 167.384195] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 167.387144] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 167.390040] Buffer I/O error on dev bcache0, logical block 1, async page read
root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero
of=/dev/bcache0 bs=4k &
[1] 1464
[2] 1465
root@bionic-bcache:~# [ 178.103060] bcache: bch_count_backing_io_errors()
dm-0: IO error on backing device, unrecoverable
[ 178.107790] Buffer I/O error on dev bcache0, logical block 0, lost async
page write
[ 178.111814] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.116428] Buffer I/O error on dev bcache0, logical block 1, lost async
page write
[ 178.119286] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.122070] Buffer I/O error on dev bcache0, logical block 2, lost async
page write
[ 178.122601] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.128535] Buffer I/O error on dev bcache0, logical block 3, lost async
page write
[ 178.132472] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.136169] Buffer I/O error on dev bcache0, logical block 4, lost async
page write
[ 178.139426] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.143021] Buffer I/O error on dev bcache0, logical block 5, lost async
page write
[ 178.146279] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.149876] Buffer I/O error on dev bcache0, logical block 6, lost async
page write
[ 178.153119] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.156697] Buffer I/O error on dev bcache0, logical block 7, lost async
page write
[ 178.159941] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.163519] Buffer I/O error on dev bcache0, logical block 8, lost async
page write
[ 178.166783] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.170706] Buffer I/O error on dev bcache0, logical block 9, lost async
page write
[ 178.173933] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.177574] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.181235] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
...
[ 178.362803] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 178.366412] bcache: bch_cached_dev_error() stop bcache0: too many IO errors
on backing device dm-0
[ 178.366412]
[ 178.501362] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already set
[ 178.504932] bcache: bch_cache_set_error() bcache: error on
f816e09d-f744-4fc9-b3bd-239f3d5093c6:
[ 178.509390] journal io error
[ 178.509391] bcache: bch_cache_set_error() , disabling caching
[ 178.509391]
[ 178.517586] bcache: conditional_stop_bcache_device()
stop_when_cache_set_failed of bcache0 is "always", stop it for failed cache set
f816
e09d-f744-4fc9-b3bd-239f3d5093c6.
[ 178.524925] bcache: conditional_stop_bcache_device()
stop_when_cache_set_failed of bcache1 is "always", stop it for failed cache set
f816
e09d-f744-4fc9-b3bd-239f3d5093c6.
[ 178.562349] bcache: cached_dev_detach_finish() Caching disabled for dm-1
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.89317 s, 371 MB/s
[ 180.186818] bcache: bcache_device_free() bcache0 stopped
[ 180.188875] bcache: bch_count_io_errors() dm-2: IO error on writing btree.
[ 180.214681] bcache: cache_set_free() Cache set
f816e09d-f744-4fc9-b3bd-239f3d5093c6 unregistered
dd: error writing '/dev/bcache1': No space left on device
[ 181.732575] bcache: bcache_device_free() bcache1 stopped
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.44023 s, 242 MB/s
root@bionic-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
fake-loop0 253:0 0 1G 0 dm
both bcache0 and bcache1 devices removed.
Modified kernel:
---------------
root@bionic-bcache:~# uname -rv
5.0.0-21-generic #22+test20190707build1 SMP Mon Jul 8 01:50:31 UTC 2019
root@bionic-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[ 25.668092] bcache: register_bdev() registered backing device dm-0
[ 25.680959] bcache: register_bdev() registered backing device dm-1
[ 25.686178] bcache: run_cache_set() invalidating existing data
[ 25.695269] bcache: register_cache() registered cache device dm-2
[ 26.691859] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set
b3823d82-8753-44ef-a7df-e1271b667021
[ 26.698108] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set
b3823d82-8753-44ef-a7df-e1271b667021
root@bionic-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
# echo writeback | tee /sys/block/bcache*/bcache/cache_mode
writeback
# echo always | tee /sys/block/bcache*/bcache/stop_when_cache_set_failed
always
root@bionic-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad
[ 49.073126] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 49.079509] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 49.086012] bcache: register_bcache() error /dev/dm-0: device already
registered (emitting change event)
[ 49.088466] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 49.093359] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 49.096298] Buffer I/O error on dev bcache0, logical block 262112, async
page read
root@bionic-bcache:~# [ 49.100583] bcache: bch_count_backing_io_errors()
dm-0: IO error on backing device, unrecoverable
[ 49.103578] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 49.107542] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 49.111926] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 49.116200] Buffer I/O error on dev bcache0, logical block 1, async page
read
root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero
of=/dev/bcache0 bs=4k &
[1] 1453
[2] 1454
root@bionic-bcache:~# [ 55.398092] bcache: bch_count_backing_io_errors()
dm-0: IO error on backing device, unrecoverable
[ 55.404433] Buffer I/O error on dev bcache0, logical block 0, lost async
page write
[ 55.409868] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.414151] Buffer I/O error on dev bcache0, logical block 1, lost async
page write
[ 55.417134] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.420521] Buffer I/O error on dev bcache0, logical block 2, lost async
page write
[ 55.423094] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.428977] Buffer I/O error on dev bcache0, logical block 3, lost async
page write
[ 55.433236] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.436400] Buffer I/O error on dev bcache0, logical block 4, lost async
page write
[ 55.439314] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
...
[ 55.720661] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.726927] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.734921] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.743469] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.747248] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.750829] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.754349] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 55.757972] bcache: bch_cached_dev_error() stop bcache0: too many IO errors
on backing device dm-0
[ 55.757972]
dd: error writing '/dev/bcache1': No space left on device
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 3.62916 s, 296 MB/s
[ 58.188089] bcache: bcache_device_free() bcache0 stopped
root@bionic-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
fake-loop0 253:0 0 1G 0 dm
bcache0 is removed, bcache1 is still avaiable.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563
Title:
bcache: risk of data loss on I/O errors in backing or caching devices
Status in linux package in Ubuntu:
Invalid
Status in linux source package in Bionic:
In Progress
Status in linux source package in Cosmic:
In Progress
Bug description:
[Impact]
* The bcache code in Bionic lacks several fixes to handle
I/O errors in both backing devices and caching devices.
* Partial or permanent errors in backing or caching devices,
specially in writeback mode, can lead to data loss and/or
the application is not notified about failed I/O requests.
* The bcache device might remain available for I/O requests
even if backing device is offline, so writes are undefined.
[Test Case]
* Detailed test cases/steps for the behavior of many patches
with code logic changes are provided in bug comments.
* The patchset has been tested for regressions on each cache
mode (writethrough, writeback, writearound, none) with the
xfstests test suite (on ext4) and fio (sequential + random
read-write).
[Regression Potential]
* The patchset is relatively large and touches several areas
in bcache code, however, synthetic testing of the patches
has been performed, and extensive regression/stress tests
were run (as mentioned in Test Case section).
* Many patches in the patchset are 'Fixes' patches to other
patches, and no further 'Fixes' currently exist upstream.
[Other Info]
* Canonical Field Eng. deploys bcache+writeback extensively
(e.g., BootStack, UA cloud, except rare all-flash cases).
[Original Bug Description]
This is a request for a backport of the following upstream patch from
4.18:
"bcache: stop bcache device when backing device is offline"
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee
Field engineering uses bcache quite extensively and it would be good
to have this in the GA/bionic kernel.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp