Verification done for Disco (one patch change only). Only one of the two bcache devices stop working upon failures in one backing device. (see comment #21 for details).
# uname -rv 5.0.0-22-generic #23-Ubuntu SMP Tue Jul 23 17:23:54 UTC 2019 # ./setup-two-bcache-one-cache.sh >/dev/null 2>&1 [ 25.748828] bcache: register_bdev() registered backing device dm-1 [ 25.759145] bcache: register_bdev() registered backing device dm-0 [ 25.767247] bcache: run_cache_set() invalidating existing data [ 25.778928] bcache: register_cache() registered cache device dm-2 [ 26.768350] bcache: bch_cached_dev_attach() Caching dm-0 as bcache1 on set 2bf1e70a-6f20-4680-bc63-f803142f294d [ 26.795147] bcache: bch_cached_dev_attach() Caching dm-1 as bcache0 on set 2bf1e70a-6f20-4680-bc63-f803142f294d # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache1 251:128 0 1024M 0 disk loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop2 7:2 0 1G 0 loop └─fake-loop2 253:2 0 1024M 0 dm ├─bcache0 251:0 0 1024M 0 disk └─bcache1 251:128 0 1024M 0 disk # echo writeback | tee /sys/block/bcache*/bcache/cache_mode writeback # echo always | tee /sys/block/bcache*/bcache/stop_when_cache_set_failed always # ./dm_fake_dev.sh /dev/loop0 bad [ 42.723192] Buffer I/O error on dev dm-0, logical block 262128, async page read [ 42.730031] Buffer I/O error on dev dm-0, logical block 262128, async page read [ 42.736198] bcache: register_bcache() error /dev/dm-0: device already registered (emitting change event) [ 42.738697] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 42.742277] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable # [ 42.746748] Buffer I/O error on dev bcache1, logical block 262112, async page read [ 42.752642] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 42.755650] Buffer I/O error on dev bcache1, logical block 262112, async page read [ 42.758209] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 42.760642] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 42.762860] Buffer I/O error on dev bcache1, logical block 1, async page read # dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero of=/dev/bcache0 bs=4k & [1] 1557 [2] 1558 # [ 58.982340] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.984076] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.985718] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.987382] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.989011] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.990645] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 58.992293] Buffer I/O error on dev bcache1, logical block 0, lost async page write [ 58.993733] Buffer I/O error on dev bcache1, logical block 1, lost async page write [ 58.995201] Buffer I/O error on dev bcache1, logical block 2, lost async page write [ 58.996651] Buffer I/O error on dev bcache1, logical block 3, lost async page write ... [ 59.096950] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 59.098669] bcache: bch_count_backing_io_errors() dm-0: IO error on backing device, unrecoverable [ 59.100621] bcache: bch_cached_dev_error() stop bcache1: too many IO errors on backing device dm-0 [ 59.100621] dd: error writing '/dev/bcache1': No space left on device 262142+0 records in 262141+0 records out [ 60.111733] bcache: bcache_device_free() bcache1 stopped 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.10457 s, 510 MB/s dd: error writing '/dev/bcache0': No space left on device 262142+0 records in 262141+0 records out 1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.67245 s, 230 MB/s # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop2 7:2 0 1G 0 loop └─fake-loop2 253:2 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk fake-loop0 253:0 0 1G 0 dm only bcache1 was stopped. bcache0 remains working. # reboot # ./setup-two-bcache-one-cache.reboot.sh >/dev/null 2>&1 [ 17.606164] bcache: register_bdev() registered backing device dm-0 [ 17.672177] bcache: register_bdev() registered backing device dm-1 [ 17.752456] bcache: bch_journal_replay() journal replay done, 4936 keys in 6 entries, seq 207 [ 17.760279] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 2bf1e70a-6f20-4680-bc63-f803142f294d [ 17.766759] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 2bf1e70a-6f20-4680-bc63-f803142f294d [ 17.771989] bcache: register_cache() registered cache device dm-2 # lsblk -e 252 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 1G 0 loop └─fake-loop0 253:0 0 1024M 0 dm └─bcache0 251:0 0 1024M 0 disk loop1 7:1 0 1G 0 loop └─fake-loop1 253:1 0 1024M 0 dm └─bcache1 251:128 0 1024M 0 disk loop2 7:2 0 1G 0 loop └─fake-loop2 253:2 0 1024M 0 dm ├─bcache0 251:0 0 1024M 0 disk └─bcache1 251:128 0 1024M 0 disk both bcache devices reattached after reboot. ** Tags removed: verification-needed-disco ** Tags added: verification-done-disco -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1829563 Title: bcache: risk of data loss on I/O errors in backing or caching devices Status in linux package in Ubuntu: Fix Committed Status in linux source package in Bionic: Fix Committed Status in linux source package in Cosmic: Won't Fix Status in linux source package in Disco: Fix Committed Status in linux source package in Eoan: Fix Committed Bug description: [Impact] * The bcache code in Bionic lacks several fixes to handle I/O errors in both backing devices and caching devices. * Partial or permanent errors in backing or caching devices, specially in writeback mode, can lead to data loss and/or the application is not notified about failed I/O requests. * The bcache device might remain available for I/O requests even if backing device is offline, so writes are undefined. [Test Case] * Detailed test cases/steps for the behavior of many patches with code logic changes are provided in bug comments. * The patchset has been tested for regressions on each cache mode (writethrough, writeback, writearound, none) with the xfstests test suite (on ext4) and fio (sequential + random read-write). [Regression Potential] * The patchset is relatively large and touches several areas in bcache code, however, synthetic testing of the patches has been performed, and extensive regression/stress tests were run (as mentioned in Test Case section). * Many patches in the patchset are 'Fixes' patches to other patches, and no further 'Fixes' currently exist upstream. [Other Info] * Canonical Field Eng. deploys bcache+writeback extensively (e.g., BootStack, UA cloud, except rare all-flash cases). [Original Bug Description] This is a request for a backport of the following upstream patch from 4.18: "bcache: stop bcache device when backing device is offline" https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee Field engineering uses bcache quite extensively and it would be good to have this in the GA/bionic kernel. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

