Hi,

our largest BTRFS filesystem is damaged but I'm unclear if it is
recoverable or not. This is a 20TB filesystem with ~13TB used in a
virtual machine using virtio-scsi backed by Ceph (Firefly 0.8.10).
The following messages have become more frequent :

fileserver kernel: sd 0:0:1:0: [sdb] tag#<number> abort

This can sometimes happen under heavy IO load and I didn't immediately
spot a new cause for them : a failing disk. Then I saw this after a
failed monthly scrub :

Mar 13 03:49:01 fileserver kernel: BTRFS: checksum error at logical
13373533028352 on dev /dev/sdb, sector 26004838336, root 257, inode
8155339, offset 131072, length 4096, links 1 (path: <damaged_file>)

This was surprising as I thought Ceph would not give back bad data. I
saw this kind too :

Mar  7 18:33:53 fileserver kernel: BTRFS warning (device sdb): csum
failed ino 8155339 off 1073152 csum 1108896639 expected csum 1374028982

The csum was always 1108896639 for different chunks so I suspect this is
the csum of a zero-filled block of data. So in case of timeouts maybe
virtio-scsi just returns a block full of zero. I actually tried to read
the affected files and saw Ceph OSD timeouts on the disk I was
suspecting of failing at the same time I got the IO error.
The disk is confirmed having relocated ~40 sectors in the same period
problems appeared, it is behind an HP SATA/SAS controller so it isn't
easy to get the whole SMART info.

I restored all files affected, launched another full scrub which passed
successfully but unfortunately the damaged got worse shortly after :

Mar 16 23:30:09 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:09 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:09 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:09 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:10 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:10 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:10 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:10 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:10 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:20 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:20 fileserver kernel: ------------[ cut here ]------------
Mar 16 23:30:20 fileserver kernel: WARNING: CPU: 2 PID: 3556 at
fs/btrfs/super.c:260 __btrfs_abort_transaction+0x46/0x110()
Mar 16 23:30:20 fileserver kernel: BTRFS: Transaction aborted (error -5)
Mar 16 23:30:20 fileserver kernel: Modules linked in: nfsd auth_rpcgss
oid_registry nfs_acl ipv6 binfmt_misc mousedev 8250 processor
crc32c_intel psmouse thermal_sys serial_core button dm_zero dm_thin_pool
dm_persistent_data dm_bio_prison dm_service_time dm_round_robin
dm_queue_length dm_multipath dm_log_userspace dm_delay virtio_console
xts gf128mul aes_x86_64 cbc sha512_generic sha256_generic sha1_generic
scsi_transport_iscsi fuse overlay xfs libcrc32c nfs lockd grace sunrpc
fscache jfs reiserfs multipath linear raid10 raid1 raid0 dm_raid raid456
async_raid6_recov async_memcpy async_pq async_xor async_tx md_mod
dm_snapshot dm_bufio dm_crypt dm_mirror dm_region_hash dm_log usbhid
xhci_pci xhci_hcd ohci_pci ohci_hcd uhci_hcd usb_storage ehci_pci
ehci_hcd usbcore usb_common sr_mod cdrom sg virtio_net
Mar 16 23:30:20 fileserver kernel: CPU: 2 PID: 3556 Comm:
btrfs-transacti Not tainted 4.1.15-gentoo-r1 #2
Mar 16 23:30:20 fileserver kernel: Hardware name: QEMU Standard PC
(i440FX + PIIX, 1996), BIOS
rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Mar 16 23:30:20 fileserver kernel:  0000000000000000 ffffffff8163e153
ffffffff81518242 ffff88082c1ebd28
Mar 16 23:30:20 fileserver kernel:  ffffffff8104ab7c ffff88042f73b600
00000000fffffffb ffff88082c68c800
Mar 16 23:30:20 fileserver kernel:  ffffffff8154f9d0 00000000000004a4
ffffffff8104abf5 ffffffff81636528
Mar 16 23:30:20 fileserver kernel: Call Trace:
Mar 16 23:30:20 fileserver kernel:  [<ffffffff81518242>] ?
dump_stack+0x40/0x50
Mar 16 23:30:20 fileserver kernel:  [<ffffffff8104ab7c>] ?
warn_slowpath_common+0x7c/0xb0
Mar 16 23:30:20 fileserver kernel:  [<ffffffff8104abf5>] ?
warn_slowpath_fmt+0x45/0x50
Mar 16 23:30:20 fileserver kernel:  [<ffffffff81252096>] ?
__btrfs_abort_transaction+0x46/0x110
Mar 16 23:30:20 fileserver kernel:  [<ffffffff812d3e6e>] ?
__btrfs_run_delayed_items+0xde/0x1d0
Mar 16 23:30:20 fileserver kernel:  [<ffffffff81280068>] ?
btrfs_commit_transaction+0x2b8/0xa60
Mar 16 23:30:20 fileserver kernel:  [<ffffffff8128089b>] ?
start_transaction+0x8b/0x5a0
Mar 16 23:30:20 fileserver kernel:  [<ffffffff8127bd0d>] ?
transaction_kthread+0x1cd/0x240
Mar 16 23:30:20 fileserver kernel:  [<ffffffff8127bb40>] ?
btrfs_cleanup_transaction+0x530/0x530
Mar 16 23:30:20 fileserver kernel:  [<ffffffff81066e8c>] ? kthread+0xbc/0xe0
Mar 16 23:30:20 fileserver kernel:  [<ffffffff81066dd0>] ?
kthread_create_on_node+0x180/0x180
Mar 16 23:30:20 fileserver kernel:  [<ffffffff8151da22>] ?
ret_from_fork+0x42/0x70
Mar 16 23:30:20 fileserver kernel:  [<ffffffff81066dd0>] ?
kthread_create_on_node+0x180/0x180
Mar 16 23:30:20 fileserver kernel: ---[ end trace f03445c45d440372 ]---
Mar 16 23:30:20 fileserver kernel: BTRFS: error (device sdb) in
__btrfs_run_delayed_items:1188: errno=-5 IO failure
Mar 16 23:30:20 fileserver kernel: BTRFS info (device sdb): forced readonly
Mar 16 23:30:20 fileserver kernel: BTRFS warning (device sdb): Skipping
commit of aborted transaction.
Mar 16 23:30:20 fileserver kernel: BTRFS: error (device sdb) in
cleanup_transaction:1692: errno=-5 IO failure
Mar 16 23:30:22 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464
Mar 16 23:30:22 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415463870464

I removed the failing disk from the cluster and rebooted the server. The
filesystem mounted fine but some time later I got these :

Mar 17 03:49:48 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415464230912
Mar 17 03:49:48 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415464230912
Mar 17 03:49:48 fileserver kernel: BTRFS (device sdb): bad tree block
start 72340172838076673 3415464230912

The filesystem didn't remount readonly this time but I installed a new
kernel (4.9.6 with the r1 Gentoo patchset insteal of 4.1.15-r1) and
rebooted again. I have a snapshot of the full device at the time of each
reboot if it can help (I can relatively easily make rw copies and work
on them without affecting the ro snapshots) and an earlier one from 4
weeks ago.

Can someone please help me determine if I can save this filesystem and
how ? I suspect there isn't much damage in quantity (there were only a
handful of damaged sectors before the disk was removed). I'm just not
sure how I can check if the internal BTRFS structures are still sound
and won't create a snowball effect destroying much more.

It is still currently used in this state in production and I'm trying to
avoid a painful switch to a remote, slow snapshot from yesterday while
beginning a very long recovery from scratch (this is at least a 2 weeks
procedure maybe more).
I'll catch some sleep right now (it's 5:28 AM here) but I'll be able to
work on this in 3 or 4 hours.

Best regards,

Lionel
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to