Ok, so I figured that despite what the BTRFS wiki seems to imply, the 'multi parity' support just isn't stable enough to be used. So, I'm trying to revert to what I had before.
My setup consist of:
* 2 x 3Tb drives +
* 3 x 2Tb drives.
I've got (had?) about 4.9Tb of data.
My idea was to convert the existing setup using a balance to a 'single'
setup, delete the 3 x 2Tb drives from the BTRFS system, then create a
new mdadm based RAID6 (5 drives degraded to 3), create a new filesystem
on that, then copy the data across.
So, great - first the balance:
$ btrfs balance start -dconvert=single -mconvert=single -f (yes, I know
it'll reduce the metadata redundancy).
This promptly was followed by a system crash.
After a reboot, I can no longer mount the BTRFS in read-write:
[ 134.768908] BTRFS info (device xvdd): disk space caching is enabled
[ 134.769032] BTRFS: has skinny extents
[ 134.769856] BTRFS: failed to read the system array on xvdd
[ 134.776055] BTRFS: open_ctree failed
[ 143.900055] BTRFS info (device xvdd): allowing degraded mounts
[ 143.900152] BTRFS info (device xvdd): not using ssd allocation scheme
[ 143.900243] BTRFS info (device xvdd): disk space caching is enabled
[ 143.900330] BTRFS: has skinny extents
[ 143.901860] BTRFS warning (device xvdd): devid 4 uuid
61ccce61-9787-453e-b793-1b86f8015ee1 is missing
[ 146.539467] BTRFS: missing devices(1) exceeds the limit(0), writeable
mount is not allowed
[ 146.552051] BTRFS: open_ctree failed
I can mount it read only - but then I also get crashes when it seems to
hit a read error:
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064
csum 3245290974 wanted 982056704 mirror 0
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
390821102 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
550556475 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1279883714 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2566472073 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1876236691 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
3350537857 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
3319706190 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2377458007 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2066127208 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
657140479 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1239359620 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1598877324 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
1082738394 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
371906697 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2156787247 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
3777709399 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
180814340 wanted 982056704 mirror 1
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:2401!
invalid opcode: 0000 [#1] SMP
Modules linked in: btrfs x86_pkg_temp_thermal coretemp crct10dif_pclmul
xor aesni_intel aes_x86_64 lrw gf128mul glue_helper pcspkr raid6_pq
ablk_helper cryptd nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
xen_netfront crc32c_intel xen_gntalloc xen_evtchn ipv6 autofs4
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
2610978113 wanted 982056704 mirror 1
BTRFS info (device xvdc): csum failed ino 42179 extent 8690008064 csum
59610051 wanted 982056704 mirror 1
CPU: 1 PID: 1273 Comm: kworker/u4:4 Not tainted 4.4.13-1.el7xen.x86_64 #1
Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
task: ffff880079ce12c0 ti: ffff880078788000 task.ti: ffff880078788000
RIP: e030:[<ffffffffa039e0e0>] [<ffffffffa039e0e0>]
btrfs_check_repairable+0x100/0x110 [btrfs]
RSP: e02b:ffff88007878bcc8 EFLAGS: 00010297
RAX: 0000000000000001 RBX: ffff880079db2080 RCX: 0000000000000003
RDX: 0000000000000003 RSI: 000004db13730000 RDI: ffff88007889ef38
RBP: ffff88007878bce0 R08: 000004db01c00000 R09: 000004dbc1c00000
R10: ffff88006bb0c1b8 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88007b213ea8 R14: 0000000000001000 R15: 0000000000000000
FS: 00007fbf2fdc0880(0000) GS:ffff88007f500000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fbf2d96702b CR3: 000000007969f000 CR4: 0000000000042660
Stack:
ffffea00019db180 0000000000010000 ffff88007b213f30 ffff88007878bd88
ffffffffa03a0808 ffff880002d15500 ffff88007878bd18 ffff880079ce12c0
ffff88007b213e40 000000000000001f ffff880000000000 ffff88006bb0c048
Call Trace:
[<ffffffffa03a0808>] end_bio_extent_readpage+0x428/0x560 [btrfs]
[<ffffffff812f40c0>] bio_endio+0x40/0x60
[<ffffffffa0375a6c>] end_workqueue_fn+0x3c/0x40 [btrfs]
[<ffffffffa03af3f1>] normal_work_helper+0xc1/0x300 [btrfs]
[<ffffffff810a1352>] ? finish_task_switch+0x82/0x280
[<ffffffffa03af702>] btrfs_endio_helper+0x12/0x20 [btrfs]
[<ffffffff81093844>] process_one_work+0x154/0x400
[<ffffffff8109438a>] worker_thread+0x11a/0x460
[<ffffffff8165a24f>] ? __schedule+0x2bf/0x880
[<ffffffff81094270>] ? rescuer_thread+0x2f0/0x2f0
[<ffffffff810993f9>] kthread+0xc9/0xe0
[<ffffffff81099330>] ? kthread_park+0x60/0x60
[<ffffffff8165e14f>] ret_from_fork+0x3f/0x70
[<ffffffff81099330>] ? kthread_park+0x60/0x60
Code: 00 31 c0 eb d5 8d 48 02 eb d9 31 c0 45 89 e0 48 c7 c6 a0 f8 3f a0
48 c7 c7 00 05 41 a0 e8 c9 f2 fa e0 31 c0 e9 70 ff ff ff 0f 0b <0f> 0b
66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90
RIP [<ffffffffa039e0e0>] btrfs_check_repairable+0x100/0x110 [btrfs]
RSP <ffff88007878bcc8>
------------[ cut here ]------------
<more crashes until the system hangs>
So, where to from here? Sadly, I feel there is data loss in my future,
but not sure how to minimise this :\
--
Steven Haigh
Email: [email protected]
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
signature.asc
Description: OpenPGP digital signature
