On Fri, Mar 05, 2021 at 12:45:08PM +0000, Filipe Manana wrote: > On Fri, Mar 5, 2021 at 1:08 AM Zygo Blaxell > <ce3g8...@umail.furryterror.org> wrote: > > > > On Tue, Mar 02, 2021 at 04:24:19PM +0000, Filipe Manana wrote: > > > On Sat, Feb 27, 2021 at 3:53 PM Zygo Blaxell > > > <ce3g8...@umail.furryterror.org> wrote: > > > > > > > > Hit this twice so far, while running the usual > > > > balance/dedupe/rsync/snapshots/all at once on: > > > > > > > > a646ddc2bba2 (kdave-gitlab/misc-next) btrfs: unlock extents in > > > > btrfs_zero_range in case of quota reservation errors > > > > > > > > Looks like tree mod log bugs are back (or never went away?). > > > > > > Different bugs causing similar problems. > > > > > > Try this: https://pastebin.com/VkesNs4R > > > > I put that patch on top of a646ddc2bba2 and ran it on the same test VM > > for a few days. It has now reached its previous uptime record without > > incident. > > > > It looks like a good fix. I'll leave it running for a few days more to > > be sure. > > Great! > > Ok, so that seems to confirm what I suspected and made me run into > other sorts of weirdness during logical ino calls (returning > unexpected results). > I haven't hit the BUG_ON() as you do, but if this is indeed caused by > allowing to reuse unwritten extent buffers in the same transaction, > it's no wonder that BUG_ON() and many other weird issues happen. > > Can you now try the following version? > > https://pastebin.com/raw/5VHjzdn6 > > Leave it for at least as many days as you tested the previous patch, > hell, even a week or more if you can.
Just to clean up the email thread: The new patch (5VHjzdn6) has now run 87 hours on top of the original misc-next a646ddc2bba2, exceeding the best uptime without a patch by about 30 hours. Now that I read this again, I notice I forgot to ask if you wanted the new patch instead of the old one, or on top of it. I guess it doesn't matter now--I ran each one separately and they both worked for my test case. > Thanks, much appreciated. > > > > > Thanks! > > > > > Thanks. > > > > > > > > > > > [40422.398920][T28995] BTRFS info (device dm-0): balance: > > > > canceled > > > > [40607.394003][T11577] BTRFS info (device dm-0): balance: start > > > > -dlimit=9 > > > > [40607.398597][T11577] BTRFS info (device dm-0): relocating > > > > block group 315676950528 flags data > > > > [40643.279661][T11577] BTRFS info (device dm-0): found 12686 > > > > extents, loops 1, stage: move data extents > > > > [40692.752695][T11577] BTRFS info (device dm-0): found 12686 > > > > extents, loops 2, stage: update data pointers > > > > [40704.860522][T11577] BTRFS info (device dm-0): relocating > > > > block group 314603208704 flags data > > > > [40704.919977][T19054] ------------[ cut here ]------------ > > > > [40704.921895][T19054] kernel BUG at fs/btrfs/ctree.c:1210! > > > > [40704.923497][T19054] invalid opcode: 0000 [#1] SMP KASAN PTI > > > > [40704.925549][T19054] CPU: 1 PID: 19054 Comm: crawl_335 > > > > Tainted: G W 5.11.0-2d11c0084b02-misc-next+ #89 > > > > [40704.929192][T19054] Hardware name: QEMU Standard PC (i440FX > > > > + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 > > > > [40704.931640][T19054] RIP: > > > > 0010:__tree_mod_log_rewind+0x3b1/0x3c0 > > > > [40704.933301][T19054] Code: 05 48 8d 74 10 65 ba 19 00 00 00 > > > > e8 89 f3 06 00 e9 a7 fd ff ff 4c 8d 7b 2c 4c 89 ff e8 f8 bd c8 ff 48 63 > > > > 43 2c e9 a2 fe ff ff <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 00 0f > > > > 1f 44 00 00 55 48 > > > > [40704.938566][T19054] RSP: 0018:ffffc90001eb70b8 EFLAGS: > > > > 00010297 > > > > [40704.940483][T19054] RAX: 0000000000000000 RBX: > > > > ffff88812344e400 RCX: ffffffffb28933b6 > > > > [40704.942668][T19054] RDX: 0000000000000007 RSI: > > > > dffffc0000000000 RDI: ffff88812344e42c > > > > [40704.945002][T19054] RBP: ffffc90001eb7108 R08: > > > > 1ffff11020b60a20 R09: ffffed1020b60a20 > > > > [40704.948513][T19054] R10: ffff888105b050f9 R11: > > > > ffffed1020b60a1f R12: 00000000000000ee > > > > [40704.951601][T19054] R13: ffff8880195520c0 R14: > > > > ffff8881bc958500 R15: ffff88812344e42c > > > > [40704.954607][T19054] FS: 00007fd1955e8700(0000) > > > > GS:ffff8881f5600000(0000) knlGS:0000000000000000 > > > > [40704.957704][T19054] CS: 0010 DS: 0000 ES: 0000 CR0: > > > > 0000000080050033 > > > > [40704.960125][T19054] CR2: 00007efdb7928718 CR3: > > > > 000000010103a006 CR4: 0000000000170ee0 > > > > [40704.963186][T19054] Call Trace: > > > > [40704.964229][T19054] btrfs_search_old_slot+0x265/0x10d0 > > > > [40704.967068][T19054] ? lock_acquired+0xbb/0x600 > > > > [40704.969148][T19054] ? btrfs_search_slot+0x1090/0x1090 > > > > [40704.971106][T19054] ? free_extent_buffer.part.61+0xd7/0x140 > > > > [40704.973020][T19054] ? free_extent_buffer+0x13/0x20 > > > > [40704.974537][T19054] resolve_indirect_refs+0x3e9/0xfc0 > > > > [40704.976154][T19054] ? lock_downgrade+0x3d0/0x3d0 > > > > [40704.977602][T19054] ? __kasan_check_read+0x11/0x20 > > > > [40704.980765][T19054] ? add_prelim_ref.part.11+0x150/0x150 > > > > [40704.983136][T19054] ? lock_downgrade+0x3d0/0x3d0 > > > > [40704.985206][T19054] ? __kasan_check_read+0x11/0x20 > > > > [40704.987403][T19054] ? lock_acquired+0xbb/0x600 > > > > [40704.989309][T19054] ? __kasan_check_write+0x14/0x20 > > > > [40704.991385][T19054] ? do_raw_spin_unlock+0xa8/0x140 > > > > [40704.993454][T19054] ? rb_insert_color+0x30/0x360 > > > > [40704.995402][T19054] ? prelim_ref_insert+0x12d/0x430 > > > > [40704.997247][T19054] find_parent_nodes+0x5c3/0x1830 > > > > [40704.999305][T19054] ? resolve_indirect_refs+0xfc0/0xfc0 > > > > [40705.000951][T19054] ? lock_release+0xc8/0x620 > > > > [40705.002748][T19054] ? fs_reclaim_acquire+0x67/0xf0 > > > > [40705.004753][T19054] ? lock_acquire+0xc7/0x510 > > > > [40705.006233][T19054] ? lock_downgrade+0x3d0/0x3d0 > > > > [40705.007683][T19054] ? > > > > lockdep_hardirqs_on_prepare+0x160/0x210 > > > > [40705.009677][T19054] ? lock_release+0xc8/0x620 > > > > [40705.011405][T19054] ? fs_reclaim_acquire+0x67/0xf0 > > > > [40705.012937][T19054] ? lock_acquire+0xc7/0x510 > > > > [40705.014293][T19054] ? poison_range+0x38/0x40 > > > > [40705.015635][T19054] ? unpoison_range+0x14/0x40 > > > > [40705.017166][T19054] ? trace_hardirqs_on+0x55/0x120 > > > > [40705.018827][T19054] btrfs_find_all_roots_safe+0x142/0x1e0 > > > > [40705.020610][T19054] ? find_parent_nodes+0x1830/0x1830 > > > > [40705.022573][T19054] ? btrfs_inode_flags_to_xflags+0x50/0x50 > > > > [40705.024743][T19054] iterate_extent_inodes+0x20e/0x580 > > > > [40705.026839][T19054] ? tree_backref_for_extent+0x230/0x230 > > > > [40705.029021][T19054] ? lock_downgrade+0x3d0/0x3d0 > > > > [40705.030432][T19054] ? read_extent_buffer+0xdd/0x110 > > > > [40705.031909][T19054] ? lock_downgrade+0x3d0/0x3d0 > > > > [40705.033274][T19054] ? __kasan_check_read+0x11/0x20 > > > > [40705.034782][T19054] ? lock_acquired+0xbb/0x600 > > > > [40705.036234][T19054] ? __kasan_check_write+0x14/0x20 > > > > [40705.037670][T19054] ? _raw_spin_unlock+0x22/0x30 > > > > [40705.039014][T19054] ? __kasan_check_write+0x14/0x20 > > > > [40705.040419][T19054] iterate_inodes_from_logical+0x129/0x170 > > > > [40705.044668][T19054] ? > > > > iterate_inodes_from_logical+0x129/0x170 > > > > [40705.047340][T19054] ? btrfs_inode_flags_to_xflags+0x50/0x50 > > > > [40705.049936][T19054] ? iterate_extent_inodes+0x580/0x580 > > > > [40705.051938][T19054] ? __vmalloc_node+0x92/0xb0 > > > > [40705.053270][T19054] ? init_data_container+0x34/0xb0 > > > > [40705.054903][T19054] ? init_data_container+0x34/0xb0 > > > > [40705.056876][T19054] ? kvmalloc_node+0x60/0x80 > > > > [40705.058372][T19054] btrfs_ioctl_logical_to_ino+0x158/0x230 > > > > [40705.060233][T19054] btrfs_ioctl+0x205e/0x4040 > > > > [40705.061465][T19054] ? __might_sleep+0x71/0xe0 > > > > [40705.063108][T19054] ? > > > > btrfs_ioctl_get_supported_features+0x30/0x30 > > > > [40705.065188][T19054] ? getrusage+0x4b6/0x9c0 > > > > [40705.066678][T19054] ? __kasan_check_read+0x11/0x20 > > > > [40705.071937][T19054] ? lock_release+0xc8/0x620 > > > > [40705.075584][T19054] ? __might_fault+0x64/0xd0 > > > > [40705.084175][T19054] ? lock_acquire+0xc7/0x510 > > > > [40705.091566][T19054] ? lock_downgrade+0x3d0/0x3d0 > > > > [40705.095180][T19054] ? > > > > lockdep_hardirqs_on_prepare+0x210/0x210 > > > > [40705.099880][T19054] ? > > > > lockdep_hardirqs_on_prepare+0x210/0x210 > > > > [40705.101645][T19054] ? __kasan_check_read+0x11/0x20 > > > > [40705.103977][T19054] ? do_vfs_ioctl+0xfc/0x9d0 > > > > [40705.105936][T19054] ? ioctl_file_clone+0xe0/0xe0 > > > > [40705.107658][T19054] ? lock_downgrade+0x3d0/0x3d0 > > > > [40705.109423][T19054] ? > > > > lockdep_hardirqs_on_prepare+0x210/0x210 > > > > [40705.119233][T19054] ? __kasan_check_read+0x11/0x20 > > > > [40705.123581][T19054] ? lock_release+0xc8/0x620 > > > > [40705.125226][T19054] ? __task_pid_nr_ns+0xd3/0x250 > > > > [40705.126984][T19054] ? lock_acquire+0xc7/0x510 > > > > [40705.128678][T19054] ? __fget_files+0x160/0x230 > > > > [40705.130365][T19054] ? __fget_light+0xf2/0x110 > > > > [40705.131959][T19054] __x64_sys_ioctl+0xc3/0x100 > > > > [40705.133637][T19054] do_syscall_64+0x37/0x80 > > > > [40705.138627][T19054] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > [40705.140552][T19054] RIP: 0033:0x7fd1976e2427 > > > > [40705.142005][T19054] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 > > > > c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 > > > > b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 > > > > f7 d8 64 89 01 48 > > > > [40705.148519][T19054] RSP: 002b:00007fd1955e5cf8 EFLAGS: > > > > 00000246 ORIG_RAX: 0000000000000010 > > > > [40705.151247][T19054] RAX: ffffffffffffffda RBX: > > > > 00007fd1955e5f40 RCX: 00007fd1976e2427 > > > > [40705.153903][T19054] RDX: 00007fd1955e5f48 RSI: > > > > 00000000c038943b RDI: 0000000000000004 > > > > [40705.156554][T19054] RBP: 0000000001000000 R08: > > > > 0000000000000000 R09: 00007fd1955e6120 > > > > [40705.159228][T19054] R10: 0000557835366b00 R11: > > > > 0000000000000246 R12: 0000000000000004 > > > > [40705.161902][T19054] R13: 00007fd1955e5f48 R14: > > > > 00007fd1955e5f40 R15: 00007fd1955e5ef8 > > > > [40705.164567][T19054] Modules linked in: > > > > [40705.165991][T19054] ---[ end trace ec8931a1c36e57be ]--- > > > > > > > > (gdb) l *(__tree_mod_log_rewind+0x3b1) > > > > 0xffffffff81893521 is in __tree_mod_log_rewind > > > > (fs/btrfs/ctree.c:1210). > > > > 1205 * the modification. as we're going > > > > backwards, we do the > > > > 1206 * opposite of each operation here. > > > > 1207 */ > > > > 1208 switch (tm->op) { > > > > 1209 case MOD_LOG_KEY_REMOVE_WHILE_FREEING: > > > > 1210 BUG_ON(tm->slot < n); > > > > 1211 fallthrough; > > > > 1212 case MOD_LOG_KEY_REMOVE_WHILE_MOVING: > > > > 1213 case MOD_LOG_KEY_REMOVE: > > > > 1214 btrfs_set_node_key(eb, > > > > &tm->key, tm->slot); > > > > > > > > > > > > -- > > > Filipe David Manana, > > > > > > “Whether you think you can, or you think you can't — you're right.” > > > > > > > -- > Filipe David Manana, > > “Whether you think you can, or you think you can't — you're right.”