Am Sat, 26 Mar 2016 20:30:35 +0100
schrieb Kai Krakow <hurikha...@gmail.com>:

> Am Wed, 23 Mar 2016 12:16:24 +0800
> schrieb Qu Wenruo <quwen...@cn.fujitsu.com>:
> 
> > Kai Krakow wrote on 2016/03/22 19:48 +0100:  
> > > Am Tue, 22 Mar 2016 16:47:10 +0800
> > > schrieb Qu Wenruo <quwen...@cn.fujitsu.com>:
> > >    
>  [...]  
> >  [...]    
>  [...]  
> > >
> > > Apparently, that system does not boot now due to errors in bcache
> > > b-tree. That being that, it may well be some bcache error and not
> > > btrfs' fault. Apparently I couldn't catch the output, I've been
> > > in a hurry. It said "write error" and had some backtrace. I will
> > > come to this back later.
> > >
> > > Let's go to the system I currently care about (that one with the
> > > always breaking VDI file):
> > >    
> >  [...]    
>  [...]  
> > >
> > > After the error occured?
> > >
> > > Yes, some text about the extent being compressed and btrfs repair
> > > doesn't currently handle that case (I tried --repair as I'm
> > > having a backup). I simply decided not to investigate that
> > > further at that point but delete and restore the affected file
> > > from backup. However, this is the message from dmesg (tho, I
> > > didn't catch the backtrace):
> > >
> > > btrfs_run_delayed_refs:2927: errno=-17 Object already exists    
> > 
> > That's nice, at least we have some clue.
> > 
> > It's almost sure, it's a bug either in btrfs kernel which doesn't
> > handle delayed refs well(low possibility), or, corrupted fs which
> > create something kernel can't handle(I bet that's the case).  
> 
> [kernel 4.5.0 gentoo, btrfs-progs 4.4.1]
> 
> Well, this time it hit me on the USB backup drive which uses no bcache
> and no other fancy options except compress-force=zlib. Apparently,
> I've only got a (real) screenshot which I'm going to link here:
> 
> https://www.dropbox.com/s/9qbc7np23y8lrii/IMG_20160326_200033.jpg?dl=0
> 
> The same drive has no problems except "bad metadata crossing stripe
> boundary" - but a lot of them. This drive was never converted, it was
> freshly generated several months ago.
> [...]

I finally got copy&paste data:

# before mounting let's check the FS:

$ sudo btrfsck /dev/disk/by-label/usb-backup 
Checking filesystem on /dev/disk/by-label/usb-backup
UUID: 1318ec21-c421-4e36-a44a-7be3d41f9c3f
checking extents
bad metadata [156041216, 156057600) crossing stripe boundary
bad metadata [181403648, 181420032) crossing stripe boundary
bad metadata [392167424, 392183808) crossing stripe boundary
bad metadata [783482880, 783499264) crossing stripe boundary
bad metadata [784924672, 784941056) crossing stripe boundary
bad metadata [130151612416, 130151628800) crossing stripe boundary
bad metadata [162826813440, 162826829824) crossing stripe boundary
bad metadata [162927083520, 162927099904) crossing stripe boundary
bad metadata [619740659712, 619740676096) crossing stripe boundary
bad metadata [619781947392, 619781963776) crossing stripe boundary
bad metadata [619795644416, 619795660800) crossing stripe boundary
bad metadata [619816091648, 619816108032) crossing stripe boundary
bad metadata [620011388928, 620011405312) crossing stripe boundary
bad metadata [890992459776, 890992476160) crossing stripe boundary
bad metadata [891022737408, 891022753792) crossing stripe boundary
bad metadata [891101773824, 891101790208) crossing stripe boundary
bad metadata [891301199872, 891301216256) crossing stripe boundary
bad metadata [1012219314176, 1012219330560) crossing stripe boundary
bad metadata [1017202409472, 1017202425856) crossing stripe boundary
bad metadata [1017365397504, 1017365413888) crossing stripe boundary
bad metadata [1020764422144, 1020764438528) crossing stripe boundary
bad metadata [1251103342592, 1251103358976) crossing stripe boundary
bad metadata [1251144695808, 1251144712192) crossing stripe boundary
bad metadata [1251147055104, 1251147071488) crossing stripe boundary
bad metadata [1259271225344, 1259271241728) crossing stripe boundary
bad metadata [1266223611904, 1266223628288) crossing stripe boundary
bad metadata [1304750063616, 1304750080000) crossing stripe boundary
bad metadata [1304790106112, 1304790122496) crossing stripe boundary
bad metadata [1304850792448, 1304850808832) crossing stripe boundary
bad metadata [1304869928960, 1304869945344) crossing stripe boundary
bad metadata [1305089540096, 1305089556480) crossing stripe boundary
bad metadata [1309561651200, 1309561667584) crossing stripe boundary
bad metadata [1309581443072, 1309581459456) crossing stripe boundary
bad metadata [1309583671296, 1309583687680) crossing stripe boundary
bad metadata [1309942808576, 1309942824960) crossing stripe boundary
bad metadata [1310050549760, 1310050566144) crossing stripe boundary
bad metadata [1313031585792, 1313031602176) crossing stripe boundary
bad metadata [1313232912384, 1313232928768) crossing stripe boundary
bad metadata [1555210764288, 1555210780672) crossing stripe boundary
bad metadata [1555395182592, 1555395198976) crossing stripe boundary
bad metadata [2050576744448, 2050576760832) crossing stripe boundary
bad metadata [2050803957760, 2050803974144) crossing stripe boundary
bad metadata [2050969108480, 2050969124864) crossing stripe boundary
checking free space tree cache and super generation don't match, space
cache will be invalidated checking fs roots
checking csums
checking root refs
found 1860217443214 bytes used err is 0
total csum bytes: 1805105116
total tree bytes: 11793776640
total fs tree bytes: 8220835840
total extent tree bytes: 1443315712
btree space waste bytes: 2307850845
file data blocks allocated: 2137151094784
 referenced 2706830905344

# now let's wait for the backup to mount the FS and look at dmesg:

[21375.606479] BTRFS info (device sde1): force zlib compression
[21375.606483] BTRFS info (device sde1): using free space tree
[21375.606485] BTRFS: has skinny extents
[21383.710725] BTRFS: checking UUID tree
[21388.531267] ------------[ cut here ]------------
[21388.531279] WARNING: CPU: 0 PID: 27085 at
fs/btrfs/extent-tree.c:2946 btrfs_run_delayed_refs+0x279/0x2b0()
[21388.531281] BTRFS: Transaction aborted (error -17) [21388.531282]
Modules linked in: nvidia_drm(PO) uas usb_storage vboxnetadp(O)
vboxnetflt(O) vboxdrv(O) nvidia_modeset(PO) nvidia(PO) [21388.531293]
CPU: 0 PID: 27085 Comm: kworker/u8:3 Tainted: P           O
4.5.0-gentoo #1 [21388.531295] Hardware name: To Be Filled By O.E.M. To
Be Filled By O.E.M./Z68 Pro3, BIOS L2.16A 02/22/2013 [21388.531300]
Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [21388.531302]
0000000000000000 ffffffff8159eaa9 ffff88029d723d68 ffffffff81ea1cf6
[21388.531306]  ffffffff810c6e37 ffff880407be5960 ffff88029d723db8
0000000000000020 [21388.531308]  ffff8802bd886510 ffff8803cc846d00
ffffffff810c6eb7 ffffffff81e8a088 [21388.531311] Call Trace:
[21388.531317]  [<ffffffff8159eaa9>] ? dump_stack+0x46/0x5d
[21388.531322]  [<ffffffff810c6e37>] ? warn_slowpath_common+0x77/0xb0
[21388.531326]  [<ffffffff810c6eb7>] ? warn_slowpath_fmt+0x47/0x50
[21388.531329]  [<ffffffff814923d9>] ?
btrfs_run_delayed_refs+0x279/0x2b0 [21388.531332]
[<ffffffff8149243d>] ? delayed_ref_async_start+0x2d/0x70
[21388.531337]  [<ffffffff810da8c3>] ? process_one_work+0x133/0x350
[21388.531340]  [<ffffffff810dae25>] ? worker_thread+0x45/0x450
[21388.531343]  [<ffffffff810dade0>] ? rescuer_thread+0x300/0x300
[21388.531347]  [<ffffffff810df3f8>] ? kthread+0xb8/0xd0
[21388.531350]  [<ffffffff810e301f>] ? finish_task_switch+0x6f/0x1f0
[21388.531352]  [<ffffffff810df340>] ? kthread_park+0x50/0x50
[21388.531356]  [<ffffffff81bdb31f>] ? ret_from_fork+0x3f/0x70
[21388.531359]  [<ffffffff810df340>] ? kthread_park+0x50/0x50
[21388.531361] ---[ end trace 69a4c78997ef63b2 ]--- [21388.531364]
BTRFS: error (device sde1) in btrfs_run_delayed_refs:2946: errno=-17
Object already exists [21388.531367] BTRFS info (device sde1): forced
readonly

This FS has been okay just a few reboots earlier (except the stripe
boundary thing).

History of kernel versions used on the system (back to the date when I
think the FS was still clean/not corrupted):

     Wed Aug  5 08:12:01 2015 >>> sys-kernel/gentoo-sources-4.1.4
     Sat Aug 15 19:04:34 2015 >>> sys-kernel/gentoo-sources-4.1.5
     Sat Aug 22 11:10:53 2015 >>> sys-kernel/gentoo-sources-4.1.6
     Tue Sep  1 00:03:38 2015 >>> sys-kernel/gentoo-sources-4.2.0
     Sat Sep  5 17:45:11 2015 >>> sys-kernel/gentoo-sources-4.2.0-r1
     Wed Sep 30 04:55:55 2015 >>> sys-kernel/gentoo-sources-4.2.2
     Tue Oct  6 22:55:19 2015 >>> sys-kernel/gentoo-sources-4.2.3
     Sun Oct 25 03:45:03 2015 >>> sys-kernel/gentoo-sources-4.2.4
     Wed Nov  4 04:21:38 2015 >>> sys-kernel/gentoo-sources-4.2.5
     Mon Nov  9 21:36:31 2015 >>> sys-kernel/gentoo-sources-4.3.0
     Sat Nov 14 10:48:03 2015 >>> sys-kernel/gentoo-sources-4.2.6
     Sun Dec 13 13:03:12 2015 >>> sys-kernel/gentoo-sources-4.2.7
     Tue Dec 15 22:37:53 2015 >>> sys-kernel/gentoo-sources-4.2.8
     Tue Dec 22 14:37:01 2015 >>> sys-kernel/gentoo-sources-4.3.3
     Fri Jan 22 20:38:32 2016 >>> sys-kernel/gentoo-sources-4.3.3-r1
     Wed Jan 27 09:27:57 2016 >>> sys-kernel/gentoo-sources-4.3.4
     Sun Jan 31 10:25:00 2016 >>> sys-kernel/gentoo-sources-4.4.0-r1
     Mon Feb  1 20:40:17 2016 >>> sys-kernel/gentoo-sources-4.4.1
     Fri Feb 19 22:11:45 2016 >>> sys-kernel/gentoo-sources-4.4.2
     Sat Feb 27 15:10:45 2016 >>> sys-kernel/gentoo-sources-4.4.3
     Sun Mar  6 14:58:05 2016 >>> sys-kernel/gentoo-sources-4.4.4
     Sat Mar 12 19:21:15 2016 >>> sys-kernel/gentoo-sources-4.4.5
     Sat Mar 19 06:13:09 2016 >>> sys-kernel/gentoo-sources-4.4.6
     Sun Mar 20 20:45:21 2016 >>> sys-kernel/gentoo-sources-4.5.0

I only saw unreliable behavior with 4.4.5, 4.4.6, and 4.5.0 tho the
problem may exist longer in my FS.

$ sudo btrfs-show-super /dev/sde1
superblock: bytenr=65536, device=/dev/sde1
---------------------------------------------------------
csum                    0xcc976d97 [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    1318ec21-c421-4e36-a44a-7be3d41f9c3f
label                   usb-backup
generation              50814
root                    1251250159616
sys_array_size          129
chunk_root_generation   50784
root_level              1
chunk_root              2516518567936
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             2000397864960
bytes_used              1860398493696
sectorsize              4096
nodesize                16384
leafsize                16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x1
incompat_flags          0x169
                        ( MIXED_BACKREF |
                          COMPRESS_LZO |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA )
csum_type               0
csum_size               4
cache_generation        50208
uuid_tree_generation    50742
dev_item.uuid           9008d5a0-ac7b-4505-8193-27428429f953
dev_item.fsid           1318ec21-c421-4e36-a44a-7be3d41f9c3f [match]
dev_item.type           0
dev_item.total_bytes    2000397864960
dev_item.bytes_used     1912308039680
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0



BTW: btrfsck thinks that the space tree is invalid every time it is
run, no matter if cleanly unmounted, uncleanly unmounted, or "btrfsck
--repair" and then ran a second time.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to