I/O Error Test 6 (for the Cosmic kernel)
================
commit: 'Revert "bcache: set CACHE_SET_IO_DISABLE in
bch_cached_dev_error()"'
Problem: if one backing device hits I/O errors the cache device
is disabled, but if that cache device is shared by other bcache
devices they stop too (even with non-failing backing devices).
Original kernel: all bcache devices that share cache device with
failing backing device are stopped.
Modified kernel: only the bcache device with the failing backing
device is stopped.
Original kernel
---------------
root@guest-bcache:~# uname -rv
4.18.0-23-generic #24-Ubuntu SMP Wed Jun 12 18:17:39 UTC 2019
root@guest-bcache:~# lsblk -e 252
root@guest-bcache:~#
root@guest-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[ 35.686002] bcache: register_bdev() registered backing device dm-0
[ 35.695980] bcache: register_bdev() registered backing device dm-1
[ 35.704662] bcache: run_cache_set() invalidating existing data
[ 35.719046] bcache: register_cache() registered cache device dm-2
[ 36.705686] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set
fce8d558-4657-47dc-ab37-226ada14daf5
[ 36.711827] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set
fce8d558-4657-47dc-ab37-226ada14daf5
root@guest-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
root@guest-bcache:~# echo writeback | tee /sys/block/dm-*/bcache/cache_mode
writeback
root@guest-bcache:~# cat /sys/block/dm-*/bcache/cache_mode
writethrough [writeback] writearound none
writethrough [writeback] writearound none
root@guest-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad
[ 76.875749] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 76.882159] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 76.889453] bcache: register_bcache() error /dev/dm-0: device already
registered (emitting change event)
[ 76.892183] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 76.904907] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 76.907711] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 76.912607] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 76.916905] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 76.920345] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 76.924767] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 76.928404] Buffer I/O error on dev bcache0, logical block 1, async page read
root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero
of=/dev/bcache0 bs=4k &
[ 175.024811] Buffer I/O error on dev bcache0, logical block 0, lost async
page write
[ 175.029844] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 175.034652] Buffer I/O error on dev bcache0, logical block 1, lost async
page write
[ 175.037465] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 175.040373] Buffer I/O error on dev bcache0, logical block 2, lost async
page write
...
[ 175.092196] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 175.096635] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 175.101272] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 175.105829] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
...
[ 175.235700] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 175.239457] bcache: bch_cached_dev_error() stop bcache0: too many IO errors
on backing device dm-0
[ 175.239457]
[ 175.324069] bcache: bch_cache_set_error() CACHE_SET_IO_DISABLE already set
[ 175.328998] bcache: error on fce8d558-4657-47dc-ab37-226ada14daf5:
[ 175.328999] journal io error
[ 175.331022] , disabling caching
[ 175.334264] bcache: conditional_stop_bcache_device()
stop_when_cache_set_failed of bcache0 is "auto" and cache is dirty, stop it to
avoid
potential data corruption.
[ 175.338865] bcache: conditional_stop_bcache_device()
stop_when_cache_set_failed of bcache1 is "auto" and cache is dirty, stop it to
avoid
potential data corruption.
[ 175.344097] bcache: cached_dev_detach_finish() Caching disabled for dm-1
[ 176.080139] bcache: bcache_device_free() bcache0 stopped
[ 176.083928] bcache: bch_count_io_errors() dm-2: IO error on writing btree.
[ 176.188371] bcache: cache_set_free() Cache set
fce8d558-4657-47dc-ab37-226ada14daf5 unregistered
[ 176.841497] bcache: bcache_device_free() bcache1 stopped
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 1.81834 s, 591 MB/s
dd: error writing '/dev/bcache1': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.5749 s, 417 MB/s
[1]- Exit 1 dd if=/dev/zero of=/dev/bcache1 bs=4k
[2]+ Exit 1 dd if=/dev/zero of=/dev/bcache0 bs=4k
root@guest-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
fake-loop0 253:0 0 1G 0 dm
Notice that bcache0 and bcache1 are missing.
Modified kernel
---------------
root@guest-bcache:~# uname -rv
4.18.0-23-generic #24+test20190627b1 SMP Thu Jun 27 13:29:22 UTC 2019
root@guest-bcache:~# lsblk -e 252
root@guest-bcache:~#
root@guest-bcache:~# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[ 146.600391] bcache: register_bdev() registered backing device dm-0
[ 146.608618] bcache: register_bdev() registered backing device dm-1
[ 146.617808] bcache: run_cache_set() invalidating existing data
[ 146.632355] bcache: register_cache() registered cache device dm-2
[ 147.615003] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set
6673bcb3-7a64-4675-a82f-59bb66886d66
[ 147.633610] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set
6673bcb3-7a64-4675-a82f-59bb66886d66
root@guest-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
root@guest-bcache:~# echo writeback | tee /sys/block/dm-*/bcache/cache_mode
writeback
root@guest-bcache:~# cat /sys/block/dm-*/bcache/cache_mode
writethrough [writeback] writearound none
writethrough [writeback] writearound none
root@guest-bcache:~# ./dm_fake_dev.sh /dev/loop0 bad
[ 174.138534] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 174.145142] Buffer I/O error on dev dm-0, logical block 262128, async page
read
[ 174.152728] bcache: register_bcache() error /dev/dm-0: device already
registered (emitting change event)
[ 174.154780] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 174.159945] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 174.162933] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 174.168696] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 174.172368] Buffer I/O error on dev bcache0, logical block 262112, async
page read
[ 174.175272] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 174.178593] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 174.181896] Buffer I/O error on dev bcache0, logical block 1, async page
read
root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero
of=/dev/bcache0 bs=4k &s
[1] 1377
[2] 1378
[ 183.348428] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 183.354587] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 183.360488] Buffer I/O error on dev bcache0, logical block 0, lost async
page write
[ 183.364666] Buffer I/O error on dev bcache0, logical block 1, lost async
page write
[ 183.368326] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
...
[ 183.430652] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 183.434399] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 183.438198] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
[ 183.441991] bcache: bch_count_backing_io_errors() dm-0: IO error on backing
device, unrecoverable
...
[ 183.635500] bcache: bch_cached_dev_error() stop bcache0: too many IO errors
on backing device dm-0
[ 183.635500]
[ 184.840023] bcache: bcache_device_free() bcache0 stopped
dd: error writing '/dev/bcache0': No space left on device
dd: error writing '/dev/bcache1': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.18238 s, 492 MB/s
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 3.69895 s, 290 MB/s
[1]- Exit 1 dd if=/dev/zero of=/dev/bcache1 bs=4k
[2]+ Exit 1 dd if=/dev/zero of=/dev/bcache0 bs=4k
root@guest-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
fake-loop0 253:0 0 1G 0 dm
Notice that only bcache0 is stopped, bcache1 is still present.
And after reboot, the bcache devices are reattached.
root@guest-bcache:~# dd if=/dev/zero of=/dev/bcache1 bs=4k
dd: error writing '/dev/bcache1': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.79076 s, 224 MB/s
root@guest-bcache:~#
root@guest-bcache:~# reboot
root@guest-bcache:~# ./setup-two-bcache-one-cache.reboot.sh
[ 104.421020] bcache: register_bdev() registered backing device dm-0
[ 104.492000] bcache: register_bdev() registered backing device dm-1
[ 104.685632] bcache: bch_journal_replay() journal replay done, 97526 keys in
57 entries, seq 359
[ 104.695263] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set
6673bcb3-7a64-4675-a82f-59bb66886d66
[ 104.704708] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set
6673bcb3-7a64-4675-a82f-59bb66886d66
[ 104.709640] bcache: register_cache() registered cache device dm-2
root@guest-bcache:~# lsblk -e 252
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 1G 0 loop
└─fake-loop0 253:0 0 1024M 0 dm
└─bcache0 251:0 0 1024M 0 disk
loop1 7:1 0 1G 0 loop
└─fake-loop1 253:1 0 1024M 0 dm
└─bcache1 251:128 0 1024M 0 disk
loop2 7:2 0 1G 0 loop
└─fake-loop2 253:2 0 1024M 0 dm
├─bcache0 251:0 0 1024M 0 disk
└─bcache1 251:128 0 1024M 0 disk
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563
Title:
bcache: risk of data loss on I/O errors in backing or caching devices
Status in linux package in Ubuntu:
Invalid
Status in linux source package in Bionic:
In Progress
Status in linux source package in Cosmic:
In Progress
Bug description:
[Impact]
* The bcache code in Bionic lacks several fixes to handle
I/O errors in both backing devices and caching devices.
* Partial or permanent errors in backing or caching devices,
specially in writeback mode, can lead to data loss and/or
the application is not notified about failed I/O requests.
* The bcache device might remain available for I/O requests
even if backing device is offline, so writes are undefined.
[Test Case]
* Detailed test cases/steps for the behavior of many patches
with code logic changes are provided in bug comments.
* The patchset has been tested for regressions on each cache
mode (writethrough, writeback, writearound, none) with the
xfstests test suite (on ext4) and fio (sequential + random
read-write).
[Regression Potential]
* The patchset is relatively large and touches several areas
in bcache code, however, synthetic testing of the patches
has been performed, and extensive regression/stress tests
were run (as mentioned in Test Case section).
* Many patches in the patchset are 'Fixes' patches to other
patches, and no further 'Fixes' currently exist upstream.
[Other Info]
* Canonical Field Eng. deploys bcache+writeback extensively
(e.g., BootStack, UA cloud, except rare all-flash cases).
[Original Bug Description]
This is a request for a backport of the following upstream patch from
4.18:
"bcache: stop bcache device when backing device is offline"
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee
Field engineering uses bcache quite extensively and it would be good
to have this in the GA/bionic kernel.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp