On Wed, Mar 10, 2021 at 5:52 PM Zygo Blaxell
<ce3g8...@umail.furryterror.org> wrote:
>
> On Fri, Mar 05, 2021 at 12:45:08PM +0000, Filipe Manana wrote:
> > On Fri, Mar 5, 2021 at 1:08 AM Zygo Blaxell
> > <ce3g8...@umail.furryterror.org> wrote:
> > >
> > > On Tue, Mar 02, 2021 at 04:24:19PM +0000, Filipe Manana wrote:
> > > > On Sat, Feb 27, 2021 at 3:53 PM Zygo Blaxell
> > > > <ce3g8...@umail.furryterror.org> wrote:
> > > > >
> > > > > Hit this twice so far, while running the usual
> > > > > balance/dedupe/rsync/snapshots/all at once on:
> > > > >
> > > > >         a646ddc2bba2 (kdave-gitlab/misc-next) btrfs: unlock extents 
> > > > > in btrfs_zero_range in case of quota reservation errors
> > > > >
> > > > > Looks like tree mod log bugs are back (or never went away?).
> > > >
> > > > Different bugs causing similar problems.
> > > >
> > > > Try this:   https://pastebin.com/VkesNs4R
> > >
> > > I put that patch on top of a646ddc2bba2 and ran it on the same test VM
> > > for a few days.  It has now reached its previous uptime record without
> > > incident.
> > >
> > > It looks like a good fix.  I'll leave it running for a few days more to
> > > be sure.
> >
> > Great!
> >
> > Ok, so that seems to confirm what I suspected and made me run into
> > other sorts of weirdness during logical ino calls (returning
> > unexpected results).
> > I haven't hit the BUG_ON() as you do, but if this is indeed caused by
> > allowing to reuse unwritten extent buffers in the same transaction,
> > it's no wonder that BUG_ON() and many other weird issues happen.
> >
> > Can you now try the following version?
> >
> > https://pastebin.com/raw/5VHjzdn6
> >
> > Leave it for at least as many days as you tested the previous patch,
> > hell, even a week or more if you can.
>
> Just to clean up the email thread:
>
> The new patch (5VHjzdn6) has now run 87 hours on top of the original
> misc-next a646ddc2bba2, exceeding the best uptime without a patch by
> about 30 hours.

Cool!

>
> Now that I read this again, I notice I forgot to ask if you wanted the new
> patch instead of the old one, or on top of it.  I guess it doesn't matter
> now--I ran each one separately and they both worked for my test case.

The new patch was meant to replace the first one.

Neither of them was meant to be definitive, since they were targeted
at other problems I was hitting sporadically, but they didn't explain
how you could hit that particular BUG_ON().
I've figured out why/how you hit that BUG_ON() and it's now fixed in
the first patch of a patchset I just sent out (I've CCed you).
You'll want to grab the two first patches of that patchset, the rest
are just refactorings and cleanups.

Thanks for all the testing and the report.

>
> > Thanks, much appreciated.
> >
> > >
> > > Thanks!
> > >
> > > > Thanks.
> > > >
> > > > >
> > > > >         [40422.398920][T28995] BTRFS info (device dm-0): balance: 
> > > > > canceled
> > > > >         [40607.394003][T11577] BTRFS info (device dm-0): balance: 
> > > > > start -dlimit=9
> > > > >         [40607.398597][T11577] BTRFS info (device dm-0): relocating 
> > > > > block group 315676950528 flags data
> > > > >         [40643.279661][T11577] BTRFS info (device dm-0): found 12686 
> > > > > extents, loops 1, stage: move data extents
> > > > >         [40692.752695][T11577] BTRFS info (device dm-0): found 12686 
> > > > > extents, loops 2, stage: update data pointers
> > > > >         [40704.860522][T11577] BTRFS info (device dm-0): relocating 
> > > > > block group 314603208704 flags data
> > > > >         [40704.919977][T19054] ------------[ cut here ]------------
> > > > >         [40704.921895][T19054] kernel BUG at fs/btrfs/ctree.c:1210!
> > > > >         [40704.923497][T19054] invalid opcode: 0000 [#1] SMP KASAN PTI
> > > > >         [40704.925549][T19054] CPU: 1 PID: 19054 Comm: crawl_335 
> > > > > Tainted: G        W         5.11.0-2d11c0084b02-misc-next+ #89
> > > > >         [40704.929192][T19054] Hardware name: QEMU Standard PC 
> > > > > (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
> > > > >         [40704.931640][T19054] RIP: 
> > > > > 0010:__tree_mod_log_rewind+0x3b1/0x3c0
> > > > >         [40704.933301][T19054] Code: 05 48 8d 74 10 65 ba 19 00 00 00 
> > > > > e8 89 f3 06 00 e9 a7 fd ff ff 4c 8d 7b 2c 4c 89 ff e8 f8 bd c8 ff 48 
> > > > > 63 43 2c e9 a2 fe ff ff <0f> 0b 0f 0b 66 66 2e 0f 1f 84 00 00 00 00 
> > > > > 00 0f 1f 44 00 00 55 48
> > > > >         [40704.938566][T19054] RSP: 0018:ffffc90001eb70b8 EFLAGS: 
> > > > > 00010297
> > > > >         [40704.940483][T19054] RAX: 0000000000000000 RBX: 
> > > > > ffff88812344e400 RCX: ffffffffb28933b6
> > > > >         [40704.942668][T19054] RDX: 0000000000000007 RSI: 
> > > > > dffffc0000000000 RDI: ffff88812344e42c
> > > > >         [40704.945002][T19054] RBP: ffffc90001eb7108 R08: 
> > > > > 1ffff11020b60a20 R09: ffffed1020b60a20
> > > > >         [40704.948513][T19054] R10: ffff888105b050f9 R11: 
> > > > > ffffed1020b60a1f R12: 00000000000000ee
> > > > >         [40704.951601][T19054] R13: ffff8880195520c0 R14: 
> > > > > ffff8881bc958500 R15: ffff88812344e42c
> > > > >         [40704.954607][T19054] FS:  00007fd1955e8700(0000) 
> > > > > GS:ffff8881f5600000(0000) knlGS:0000000000000000
> > > > >         [40704.957704][T19054] CS:  0010 DS: 0000 ES: 0000 CR0: 
> > > > > 0000000080050033
> > > > >         [40704.960125][T19054] CR2: 00007efdb7928718 CR3: 
> > > > > 000000010103a006 CR4: 0000000000170ee0
> > > > >         [40704.963186][T19054] Call Trace:
> > > > >         [40704.964229][T19054]  btrfs_search_old_slot+0x265/0x10d0
> > > > >         [40704.967068][T19054]  ? lock_acquired+0xbb/0x600
> > > > >         [40704.969148][T19054]  ? btrfs_search_slot+0x1090/0x1090
> > > > >         [40704.971106][T19054]  ? 
> > > > > free_extent_buffer.part.61+0xd7/0x140
> > > > >         [40704.973020][T19054]  ? free_extent_buffer+0x13/0x20
> > > > >         [40704.974537][T19054]  resolve_indirect_refs+0x3e9/0xfc0
> > > > >         [40704.976154][T19054]  ? lock_downgrade+0x3d0/0x3d0
> > > > >         [40704.977602][T19054]  ? __kasan_check_read+0x11/0x20
> > > > >         [40704.980765][T19054]  ? add_prelim_ref.part.11+0x150/0x150
> > > > >         [40704.983136][T19054]  ? lock_downgrade+0x3d0/0x3d0
> > > > >         [40704.985206][T19054]  ? __kasan_check_read+0x11/0x20
> > > > >         [40704.987403][T19054]  ? lock_acquired+0xbb/0x600
> > > > >         [40704.989309][T19054]  ? __kasan_check_write+0x14/0x20
> > > > >         [40704.991385][T19054]  ? do_raw_spin_unlock+0xa8/0x140
> > > > >         [40704.993454][T19054]  ? rb_insert_color+0x30/0x360
> > > > >         [40704.995402][T19054]  ? prelim_ref_insert+0x12d/0x430
> > > > >         [40704.997247][T19054]  find_parent_nodes+0x5c3/0x1830
> > > > >         [40704.999305][T19054]  ? resolve_indirect_refs+0xfc0/0xfc0
> > > > >         [40705.000951][T19054]  ? lock_release+0xc8/0x620
> > > > >         [40705.002748][T19054]  ? fs_reclaim_acquire+0x67/0xf0
> > > > >         [40705.004753][T19054]  ? lock_acquire+0xc7/0x510
> > > > >         [40705.006233][T19054]  ? lock_downgrade+0x3d0/0x3d0
> > > > >         [40705.007683][T19054]  ? 
> > > > > lockdep_hardirqs_on_prepare+0x160/0x210
> > > > >         [40705.009677][T19054]  ? lock_release+0xc8/0x620
> > > > >         [40705.011405][T19054]  ? fs_reclaim_acquire+0x67/0xf0
> > > > >         [40705.012937][T19054]  ? lock_acquire+0xc7/0x510
> > > > >         [40705.014293][T19054]  ? poison_range+0x38/0x40
> > > > >         [40705.015635][T19054]  ? unpoison_range+0x14/0x40
> > > > >         [40705.017166][T19054]  ? trace_hardirqs_on+0x55/0x120
> > > > >         [40705.018827][T19054]  btrfs_find_all_roots_safe+0x142/0x1e0
> > > > >         [40705.020610][T19054]  ? find_parent_nodes+0x1830/0x1830
> > > > >         [40705.022573][T19054]  ? 
> > > > > btrfs_inode_flags_to_xflags+0x50/0x50
> > > > >         [40705.024743][T19054]  iterate_extent_inodes+0x20e/0x580
> > > > >         [40705.026839][T19054]  ? tree_backref_for_extent+0x230/0x230
> > > > >         [40705.029021][T19054]  ? lock_downgrade+0x3d0/0x3d0
> > > > >         [40705.030432][T19054]  ? read_extent_buffer+0xdd/0x110
> > > > >         [40705.031909][T19054]  ? lock_downgrade+0x3d0/0x3d0
> > > > >         [40705.033274][T19054]  ? __kasan_check_read+0x11/0x20
> > > > >         [40705.034782][T19054]  ? lock_acquired+0xbb/0x600
> > > > >         [40705.036234][T19054]  ? __kasan_check_write+0x14/0x20
> > > > >         [40705.037670][T19054]  ? _raw_spin_unlock+0x22/0x30
> > > > >         [40705.039014][T19054]  ? __kasan_check_write+0x14/0x20
> > > > >         [40705.040419][T19054]  
> > > > > iterate_inodes_from_logical+0x129/0x170
> > > > >         [40705.044668][T19054]  ? 
> > > > > iterate_inodes_from_logical+0x129/0x170
> > > > >         [40705.047340][T19054]  ? 
> > > > > btrfs_inode_flags_to_xflags+0x50/0x50
> > > > >         [40705.049936][T19054]  ? iterate_extent_inodes+0x580/0x580
> > > > >         [40705.051938][T19054]  ? __vmalloc_node+0x92/0xb0
> > > > >         [40705.053270][T19054]  ? init_data_container+0x34/0xb0
> > > > >         [40705.054903][T19054]  ? init_data_container+0x34/0xb0
> > > > >         [40705.056876][T19054]  ? kvmalloc_node+0x60/0x80
> > > > >         [40705.058372][T19054]  btrfs_ioctl_logical_to_ino+0x158/0x230
> > > > >         [40705.060233][T19054]  btrfs_ioctl+0x205e/0x4040
> > > > >         [40705.061465][T19054]  ? __might_sleep+0x71/0xe0
> > > > >         [40705.063108][T19054]  ? 
> > > > > btrfs_ioctl_get_supported_features+0x30/0x30
> > > > >         [40705.065188][T19054]  ? getrusage+0x4b6/0x9c0
> > > > >         [40705.066678][T19054]  ? __kasan_check_read+0x11/0x20
> > > > >         [40705.071937][T19054]  ? lock_release+0xc8/0x620
> > > > >         [40705.075584][T19054]  ? __might_fault+0x64/0xd0
> > > > >         [40705.084175][T19054]  ? lock_acquire+0xc7/0x510
> > > > >         [40705.091566][T19054]  ? lock_downgrade+0x3d0/0x3d0
> > > > >         [40705.095180][T19054]  ? 
> > > > > lockdep_hardirqs_on_prepare+0x210/0x210
> > > > >         [40705.099880][T19054]  ? 
> > > > > lockdep_hardirqs_on_prepare+0x210/0x210
> > > > >         [40705.101645][T19054]  ? __kasan_check_read+0x11/0x20
> > > > >         [40705.103977][T19054]  ? do_vfs_ioctl+0xfc/0x9d0
> > > > >         [40705.105936][T19054]  ? ioctl_file_clone+0xe0/0xe0
> > > > >         [40705.107658][T19054]  ? lock_downgrade+0x3d0/0x3d0
> > > > >         [40705.109423][T19054]  ? 
> > > > > lockdep_hardirqs_on_prepare+0x210/0x210
> > > > >         [40705.119233][T19054]  ? __kasan_check_read+0x11/0x20
> > > > >         [40705.123581][T19054]  ? lock_release+0xc8/0x620
> > > > >         [40705.125226][T19054]  ? __task_pid_nr_ns+0xd3/0x250
> > > > >         [40705.126984][T19054]  ? lock_acquire+0xc7/0x510
> > > > >         [40705.128678][T19054]  ? __fget_files+0x160/0x230
> > > > >         [40705.130365][T19054]  ? __fget_light+0xf2/0x110
> > > > >         [40705.131959][T19054]  __x64_sys_ioctl+0xc3/0x100
> > > > >         [40705.133637][T19054]  do_syscall_64+0x37/0x80
> > > > >         [40705.138627][T19054]  
> > > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > >         [40705.140552][T19054] RIP: 0033:0x7fd1976e2427
> > > > >         [40705.142005][T19054] Code: 00 00 90 48 8b 05 69 aa 0c 00 64 
> > > > > c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 
> > > > > 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 
> > > > > 0c 00 f7 d8 64 89 01 48
> > > > >         [40705.148519][T19054] RSP: 002b:00007fd1955e5cf8 EFLAGS: 
> > > > > 00000246 ORIG_RAX: 0000000000000010
> > > > >         [40705.151247][T19054] RAX: ffffffffffffffda RBX: 
> > > > > 00007fd1955e5f40 RCX: 00007fd1976e2427
> > > > >         [40705.153903][T19054] RDX: 00007fd1955e5f48 RSI: 
> > > > > 00000000c038943b RDI: 0000000000000004
> > > > >         [40705.156554][T19054] RBP: 0000000001000000 R08: 
> > > > > 0000000000000000 R09: 00007fd1955e6120
> > > > >         [40705.159228][T19054] R10: 0000557835366b00 R11: 
> > > > > 0000000000000246 R12: 0000000000000004
> > > > >         [40705.161902][T19054] R13: 00007fd1955e5f48 R14: 
> > > > > 00007fd1955e5f40 R15: 00007fd1955e5ef8
> > > > >         [40705.164567][T19054] Modules linked in:
> > > > >         [40705.165991][T19054] ---[ end trace ec8931a1c36e57be ]---
> > > > >
> > > > >         (gdb) l *(__tree_mod_log_rewind+0x3b1)
> > > > >         0xffffffff81893521 is in __tree_mod_log_rewind 
> > > > > (fs/btrfs/ctree.c:1210).
> > > > >         1205                     * the modification. as we're going 
> > > > > backwards, we do the
> > > > >         1206                     * opposite of each operation here.
> > > > >         1207                     */
> > > > >         1208                    switch (tm->op) {
> > > > >         1209                    case MOD_LOG_KEY_REMOVE_WHILE_FREEING:
> > > > >         1210                            BUG_ON(tm->slot < n);
> > > > >         1211                            fallthrough;
> > > > >         1212                    case MOD_LOG_KEY_REMOVE_WHILE_MOVING:
> > > > >         1213                    case MOD_LOG_KEY_REMOVE:
> > > > >         1214                            btrfs_set_node_key(eb, 
> > > > > &tm->key, tm->slot);
> > > >
> > > >
> > > >
> > > > --
> > > > Filipe David Manana,
> > > >
> > > > “Whether you think you can, or you think you can't — you're right.”
> > > >
> >
> >
> >
> > --
> > Filipe David Manana,
> >
> > “Whether you think you can, or you think you can't — you're right.”



-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

Reply via email to