Verification done for Disco (one patch change only).

Only one of the two bcache devices stop working upon failures in one backing 
device.
(see comment #21 for details).

# uname -rv
5.0.0-22-generic #23-Ubuntu SMP Tue Jul 23 17:23:54 UTC 2019

# ./setup-two-bcache-one-cache.sh >/dev/null 2>&1
[   25.748828] bcache: register_bdev() registered backing device dm-1
[   25.759145] bcache: register_bdev() registered backing device dm-0
[   25.767247] bcache: run_cache_set() invalidating existing data
[   25.778928] bcache: register_cache() registered cache device dm-2
[   26.768350] bcache: bch_cached_dev_attach() Caching dm-0 as bcache1 on set 
2bf1e70a-6f20-4680-bc63-f803142f294d
[   26.795147] bcache: bch_cached_dev_attach() Caching dm-1 as bcache0 on set 
2bf1e70a-6f20-4680-bc63-f803142f294d

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  ├─bcache0  251:0    0 1024M  0 disk 
  └─bcache1  251:128  0 1024M  0 disk 

# echo writeback | tee /sys/block/bcache*/bcache/cache_mode
writeback

# echo always | tee /sys/block/bcache*/bcache/stop_when_cache_set_failed
always

# ./dm_fake_dev.sh /dev/loop0 bad
[   42.723192] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[   42.730031] Buffer I/O error on dev dm-0, logical block 262128, async page 
read
[   42.736198] bcache: register_bcache() error /dev/dm-0: device already 
registered (emitting change event)
[   42.738697] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   42.742277] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
# [   42.746748] Buffer I/O error on dev bcache1, logical block 262112, async 
page read
[   42.752642] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   42.755650] Buffer I/O error on dev bcache1, logical block 262112, async 
page read
[   42.758209] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   42.760642] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   42.762860] Buffer I/O error on dev bcache1, logical block 1, async page read


# dd if=/dev/zero of=/dev/bcache1 bs=4k & dd if=/dev/zero of=/dev/bcache0 bs=4k 
&
[1] 1557
[2] 1558
# [   58.982340] bcache: bch_count_backing_io_errors() dm-0: IO error on 
backing device, unrecoverable
[   58.984076] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.985718] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.987382] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.989011] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.990645] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   58.992293] Buffer I/O error on dev bcache1, logical block 0, lost async 
page write
[   58.993733] Buffer I/O error on dev bcache1, logical block 1, lost async 
page write
[   58.995201] Buffer I/O error on dev bcache1, logical block 2, lost async 
page write
[   58.996651] Buffer I/O error on dev bcache1, logical block 3, lost async 
page write
...
[   59.096950] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   59.098669] bcache: bch_count_backing_io_errors() dm-0: IO error on backing 
device, unrecoverable
[   59.100621] bcache: bch_cached_dev_error() stop bcache1: too many IO errors 
on backing device dm-0
[   59.100621]
dd: error writing '/dev/bcache1': No space left on device
262142+0 records in
262141+0 records out

[   60.111733] bcache: bcache_device_free() bcache1 stopped

1073729536 bytes (1.1 GB, 1.0 GiB) copied, 2.10457 s, 510 MB/s
dd: error writing '/dev/bcache0': No space left on device
262142+0 records in
262141+0 records out
1073729536 bytes (1.1 GB, 1.0 GiB) copied, 4.67245 s, 230 MB/s

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
fake-loop0   253:0    0    1G  0 dm 

only bcache1 was stopped. bcache0 remains working.

# reboot

# ./setup-two-bcache-one-cache.reboot.sh >/dev/null 2>&1
[   17.606164] bcache: register_bdev() registered backing device dm-0
[   17.672177] bcache: register_bdev() registered backing device dm-1
[   17.752456] bcache: bch_journal_replay() journal replay done, 4936 keys in 6 
entries, seq 207
[   17.760279] bcache: bch_cached_dev_attach() Caching dm-1 as bcache1 on set 
2bf1e70a-6f20-4680-bc63-f803142f294d
[   17.766759] bcache: bch_cached_dev_attach() Caching dm-0 as bcache0 on set 
2bf1e70a-6f20-4680-bc63-f803142f294d
[   17.771989] bcache: register_cache() registered cache device dm-2

# lsblk -e 252
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0          7:0    0    1G  0 loop 
└─fake-loop0 253:0    0 1024M  0 dm   
  └─bcache0  251:0    0 1024M  0 disk 
loop1          7:1    0    1G  0 loop 
└─fake-loop1 253:1    0 1024M  0 dm   
  └─bcache1  251:128  0 1024M  0 disk 
loop2          7:2    0    1G  0 loop 
└─fake-loop2 253:2    0 1024M  0 dm   
  ├─bcache0  251:0    0 1024M  0 disk 
  └─bcache1  251:128  0 1024M  0 disk 


both bcache devices reattached after reboot.

** Tags removed: verification-needed-disco
** Tags added: verification-done-disco

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1829563

Title:
  bcache: risk of data loss on I/O errors in backing or caching devices

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Won't Fix
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed

Bug description:
  [Impact]

   * The bcache code in Bionic lacks several fixes to handle
     I/O errors in both backing devices and caching devices.

   * Partial or permanent errors in backing or caching devices,
     specially in writeback mode, can lead to data loss and/or
     the application is not notified about failed I/O requests.

   * The bcache device might remain available for I/O requests
     even if backing device is offline, so writes are undefined.

  [Test Case]

   * Detailed test cases/steps for the behavior of many patches
     with code logic changes are provided in bug comments.

   * The patchset has been tested for regressions on each cache
     mode (writethrough, writeback, writearound, none) with the
     xfstests test suite (on ext4) and fio (sequential + random
     read-write).

  [Regression Potential]

   * The patchset is relatively large and touches several areas
     in bcache code, however, synthetic testing of the patches
     has been performed, and extensive regression/stress tests
     were run (as mentioned in Test Case section).

   * Many patches in the patchset are 'Fixes' patches to other
     patches, and no further 'Fixes' currently exist upstream.

  [Other Info]

   * Canonical Field Eng. deploys bcache+writeback extensively
     (e.g., BootStack, UA cloud, except rare all-flash cases).

  [Original Bug Description]

  This is a request for a backport of the following upstream patch from
  4.18:

  "bcache: stop bcache device when backing device is offline"
  
https://github.com/torvalds/linux/commit/0f0709e6bfc3ce4e8e1c0e8573490c45f76cfeee

  Field engineering uses bcache quite extensively and it would be good
  to have this in the GA/bionic kernel.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1829563/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to