I/O Error Test 6
================
Make sure that if two bcache devices share a cache device,
only one bcache device is offlined in case of I/O errors
in only one backing device.
Original
--------
# uname -rv
4.15.0-55-generic #60-Ubuntu SMP Tue Jul 2 18:22:20 UTC 2019
# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[ 20.464988] bcache: register_bdev() registered backing device dm-0
[ 20.474082] bcache: register_bdev() registered backing device dm-1
[ 20.485464] bcache: run_cache_set() invalidating existing data
[ 20.496253] bcache: register_cache() registered cache device dm-2
[ 21.478688] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set
e247271f-eea4-46ea-8ffe-4b04299a7c24
[ 21.484183] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set
e247271f-eea4-46ea-8ffe-4b04299a7c24
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
# echo writeback | tee /sys/block/dm-*/bcache/cache_mode
writeback
# cat /sys/block/dm-*/bcache/cache_mode
writethrough [writeback] writearound none
writethrough [writeback] writearound none
# ./dm_fake_dev.sh /dev/loop0 bad
[ 56.142318] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 56.151754] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 56.158706] bcache: register_bcache() error /dev/dm-0: device already
registered (emitting change event)
[ 56.161631] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 56.169723] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 56.172303] Buffer I/O error on dev bcache0, logical block 1, async page read
root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero
of=/dev/bcache0 bs=4k &
[1] 1400
[2] 1401
# [ 85.400054] Buffer I/O error on dev bcache0, logical block 0, lost async
page write
[ 85.403138] Buffer I/O error on dev bcache0, logical block 1, lost async
page write
[ 85.417864] Buffer I/O error on dev bcache0, logical block 2, lost async
page write
[ 85.419699] Buffer I/O error on dev bcache0, logical block 3, lost async
page write
[ 85.421510] Buffer I/O error on dev bcache0, logical block 4, lost async
page write
[ 85.423301] Buffer I/O error on dev bcache0, logical block 5, lost async
page write
[ 85.450407] Buffer I/O error on dev bcache0, logical block 6, lost async
page write
[ 85.452248] Buffer I/O error on dev bcache0, logical block 7, lost async
page write
[ 85.453980] Buffer I/O error on dev bcache0, logical block 8, lost async
page write
[ 85.455722] Buffer I/O error on dev bcache0, logical block 9, lost async
page write
dd: error writing '/dev/bcache0': No space left on device
dd: error writing '/dev/bcache1': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.23916 s, 253 MB/s
[2]+ Exit 1 dd if=/dev/zero of=/dev/bcache0 bs=4k
# 262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 6.5355 s, 164 MB/s
[1]+ Exit 1 dd if=/dev/zero of=/dev/bcache1 bs=4k
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
fake-loop0 253:0 0 1G 0 dm
└─bcache0 251:0 0 1024M 0 disk
Nothing is removed, since no I/O errors are detected.
Modified
--------
# uname -rv
4.15.0-55-generic #60+test20190703build1bcache1-Ubuntu SMP Wed Jul 3 21:41:37
UTC
# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[ 29.341483] bcache: register_bdev() registered backing device dm-0
[ 29.351963] bcache: run_cache_set() invalidating existing data
[ 29.365256] bcache: register_cache() registered cache device dm-2
[ 29.365566] bcache: register_bdev() registered backing device dm-1
[ 30.357267] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set
ee57b847-da70-4b65-888b-e5795ecf6c46
[ 30.363117] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set
ee57b847-da70-4b65-888b-e5795ecf6c46
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
# echo writeback | tee /sys/block/dm-*/bcache/cache_mode
writeback
# cat /sys/block/dm-*/bcache/cache_mode
writethrough [writeback] writearound none
writethrough [writeback] writearound none
# ./dm_fake_dev.sh /dev/loop0 bad
[ 63.680438] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 63.686730] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 63.693292] bcache: register_bcache() error /dev/dm-0: device already
registered (emitting change event)
[ 63.695221] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 63.695248] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 63.695253] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 63.697374] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 63.697389] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 63.697414] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 63.697430] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 63.697433] Buffer I/O error on dev bcache0, logical block 1, async page read
root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero
of=/dev/bcache0 bs=4k &
[1] 1391
[2] 1392
# [ 75.160405] bcache: bch_count_backing_io_errors() dm-0: IO error on
backing device, unrecoverable
[ 75.200618] Buffer I/O error on dev bcache0, logical block 0, lost async
page write
[ 75.204141] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 75.207546] Buffer I/O error on dev bcache0, logical block 1, lost async
page write
[ 75.210500] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 75.212927] Buffer I/O error on dev bcache0, logical block 2, lost async
page write
[ 75.214696] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 75.216750] Buffer I/O error on dev bcache0, logical block 3, lost async
page write
[ 75.218637] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 75.220841] Buffer I/O error on dev bcache0, logical block 4, lost async
page write
[ 75.222737] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
...
[ 75.397159] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 75.399609] bcache: bch_cached_dev_error() stop bcache0: too many IO errors
on backing device dm-0
[ 75.399609]
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in[ 76.558623] bcache: bcache_device_free() bcache0 stopped
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.15178 s, 499 MB/s
dd: error writing '/dev/bcache1': No space left on device
[2]+ Exit 1 dd if=/dev/zero of=/dev/bcache0 bs=4k
# 262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.65434 s, 231 MB/s
[1]+ Exit 1 dd if=/dev/zero of=/dev/bcache1 bs=4k
#
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
fake-loop0 253:0 0 1G 0 dm
Only bcache0 was stopped.
Note bcache1 remains working.
# reboot
# ./setup-two-bcache-one-cache.reboot.sh >/dev/null 2>&1
[ 40.814007] bcache: register_bdev() registered backing device dm-0
[ 40.890096] bcache: register_bdev() registered backing device dm-1
[ 41.007100] bcache: bch_journal_replay() journal replay done, 63545 keys in
37 entries, seq 148
[ 41.012823] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set
ee57b847-da70-4b65-888b-e5795ecf6c46
[ 41.021751] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set
ee57b847-da70-4b65-888b-e5795ecf6c46
[ 41.025195] bcache: register_cache() registered cache device dm-2
# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
#
After reboot, the bcache devices were reattached successfully.
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563
Title:
bcache: risk of data loss on I/O errors in backing or caching devices
Status in linux package in Ubuntu:
Invalid
Status in linux source package in Bionic:
In Progress
Status in linux source package in Cosmic:
In Progress
Bug description:
[Impact]
* The bcache code in Bionic lacks several fixes to handle
I/O errors in both backing devices and caching devices.
* Partial or permanent errors in backing or caching devices,
specially in writeback mode, can lead to data loss and/or
the application is not notified about failed I/O requests.
* The bcache device might remain available for I/O requests
even if backing device is offline, so writes are undefined.
[Test Case]
* Detailed test cases/steps for the behavior of almost every
patch with code logic changes are provided in bug comments.
* The patchset has been tested for regressions on each cache
mode (writethrough, writeback, writearound, none) with the
xfstests test suite (on ext4), fio (random read-write) and
iozone (several read/write tests).
[Regression Potential]
* The patchset is relatively large and touches several areas
in bcache code, however, synthetic testing of the patches
has been performed, and extensive regression/stress tests
were run (as mentioned in Test Case section).
* Many patches in the patchset are 'Fixes' patches to other
patches, and no further 'Fixes' currently exist upstream.
[Other Info]
* Canonical Field Eng. deploys bcache+writeback extensively
(e.g., BootStack, UA cloud, except rare all-flash cases).
[Original Bug Description]
This is a request for a backport of the following upstream patch from
4.18:
"bcache: stop bcache device when backing device is offline"
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee
Field engineering uses bcache quite extensively and it would be good
to have this in the GA/bionic kernel.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp