On Thu, 2 Aug 2012 13:46:31 +0200, David Sterba wrote:
> Hi,
> 
> appologies for late reply,
> 
> On Thu, Aug 02, 2012 at 12:40:46PM +0800, Miao Xie wrote:
>> Changelog v1 -> v2:
>> - add comment to explain why we need deal with the delayed items after
>>   snapshot creation and why this operation do not corrupt the metadata.
> 
> I'm sorry, the comment did not fix the bug :)
> 
> The subvol stress is able to hit this:
> 
> [ 2360.444321] ------------[ cut here ]------------
> [ 2360.448019] kernel BUG at fs/btrfs/extent-tree.c:6047!
> [ 2360.448019] invalid opcode: 0000 [#1] SMP
> [ 2360.448019] CPU 0
> [ 2360.448019] Modules linked in: btrfs aoe [last unloaded: btrfs]
> [ 2360.448019]
> [ 2360.448019] Pid: 8212, comm: btrfs Not tainted 3.5.0-default+ #170 Intel 
> Corporation Santa Rosa platform/Matanzas
> [ 2360.448019] RIP: 0010:[<ffffffffa00f62a1>]  [<ffffffffa00f62a1>] 
> run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.448019] RSP: 0018:ffff88003eca1a68  EFLAGS: 00010246
> [ 2360.448019] RAX: 00000000000007ff RBX: ffff880017a694c8 RCX: 
> ffff88003eca1a08
> [ 2360.448019] RDX: ffff880028aa9000 RSI: 00000000000007fe RDI: 
> ffff880064223cf0
> [ 2360.448019] RBP: ffff88003eca1b48 R08: 00000000000007ff R09: 
> ffff88003eca19f8
> [ 2360.448019] R10: ffff88002435d1e8 R11: 0000000000000000 R12: 
> ffff880025d66d28
> [ 2360.448019] R13: ffff880038640000 R14: ffff8800778dfa88 R15: 
> ffff880060f010d0
> [ 2360.448019] FS:  00007f3289f35740(0000) GS:ffff88007dc00000(0000) 
> knlGS:0000000000000000
> [ 2360.448019] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 2360.448019] CR2: ffffffffff600400 CR3: 000000002e112000 CR4: 
> 00000000000007f0
> [ 2360.448019] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
> 0000000000000000
> [ 2360.448019] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 
> 0000000000000400
> [ 2360.448019] Process btrfs (pid: 8212, threadinfo ffff88003eca0000, task 
> ffff88001d834200)
> [ 2360.448019] Stack:
> [ 2360.448019]  0000000000000000 0000000000000000 0000000000000001 
> 0000000000000000
> [ 2360.448019]  00000000000007ed ffff88002435d1e8 000000003eca1b18 
> 0000000000000000
> [ 2360.448019]  0000000000000770 0000000000000000 000000005cb1e000 
> ffff88003eca1c08
> [ 2360.448019] Call Trace:
> [ 2360.448019]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 
> [btrfs]
> [ 2360.448019]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> [ 2360.448019]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> [ 2360.448019]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 
> [btrfs]
> [ 2360.448019]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 
> [btrfs]
> [ 2360.448019]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> [ 2360.448019]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> [ 2360.448019]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> [ 2360.448019]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> [ 2360.448019]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> [ 2360.448019]  [<ffffffffa013fe5a>] 
> btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> [ 2360.448019]  [<ffffffffa014005d>] 
> btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> [ 2360.448019]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> [ 2360.448019]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> [ 2360.448019]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> [ 2360.448019]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> [ 2360.448019]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> [ 2360.448019]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> [ 2360.448019]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> [ 2360.448019]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> [ 2360.448019] Code: 8b 76 40 48 89 d7 48 89 55 a0 e8 2b 74 ff ff 83 f8 17 0f 
> 87 1e ff ff ff 0f 0b 80 fa b2 0f 84 b4 f8 ff ff 0f 0b 0f 0b 0f 0b 0f 0b <0f> 
> 0b 0f 0b 0f 0b 0f 0b 0f 1f 80 00 00 00 00 55 48 89 e5 41 57
> [ 2360.448019] RIP  [<ffffffffa00f62a1>] run_clustered_refs+0xa11/0xa20 
> [btrfs]
> [ 2360.448019]  RSP <ffff88003eca1a68>
> [ 2360.814508] ---[ end trace 555a16cac3620ccb ]---
> [ 2360.820398] note: btrfs[8212] exited with preempt_count 1
> [ 2360.827072] BUG: sleeping function called from invalid context at 
> kernel/rwsem.c:20
> [ 2360.836047] in_atomic(): 1, irqs_disabled(): 0, pid: 8212, name: btrfs
> [ 2360.843859] INFO: lockdep is turned off.
> [ 2360.849021] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ 
> #170
> [ 2360.849022] Call Trace:
> [ 2360.849027]  [<ffffffff8107a40c>] __might_sleep+0xfc/0x130
> [ 2360.849030]  [<ffffffff818ea0f6>] down_read+0x26/0xa0
> [ 2360.849034]  [<ffffffff810b416b>] acct_collect+0x4b/0x1b0
> [ 2360.849038]  [<ffffffff8104c838>] do_exit+0x718/0x9a0
> [ 2360.849041]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
> [ 2360.849043]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
> [ 2360.849046]  [<ffffffff81005a7b>] die+0x5b/0x90
> [ 2360.849048]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
> [ 2360.849052]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
> [ 2360.849067]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.849071]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [ 2360.849073]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
> [ 2360.849076]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
> [ 2360.849087]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.849097]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
> [ 2360.849108]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 
> [btrfs]
> [ 2360.849110]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> [ 2360.849119]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> [ 2360.849133]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 
> [btrfs]
> [ 2360.849145]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 
> [btrfs]
> [ 2360.849148]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> [ 2360.849161]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> [ 2360.849164]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> [ 2360.849166]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> [ 2360.849180]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> [ 2360.849194]  [<ffffffffa013fe5a>] 
> btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> [ 2360.849207]  [<ffffffffa014005d>] 
> btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> [ 2360.849221]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> [ 2360.849224]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> [ 2360.849226]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> [ 2360.849229]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> [ 2360.849231]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> [ 2360.849234]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> [ 2360.849236]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> [ 2360.849239]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> [ 2360.849255] BUG: scheduling while atomic: btrfs/8212/0x10000002
> [ 2360.849256] INFO: lockdep is turned off.
> [ 2360.849257] Modules linked in: btrfs aoe [last unloaded: btrfs]
> [ 2360.849261] Pid: 8212, comm: btrfs Tainted: G      D      3.5.0-default+ 
> #170
> [ 2360.849262] Call Trace:
> [ 2360.849262]  [<ffffffff81078318>] __schedule_bug+0x68/0x90
> [ 2360.849265]  [<ffffffff818eafcc>] __schedule+0x73c/0x810
> [ 2360.849268]  [<ffffffff8107b48a>] __cond_resched+0x2a/0x40
> [ 2360.849270]  [<ffffffff818eb121>] _cond_resched+0x31/0x40
> [ 2360.849273]  [<ffffffff81128e13>] unmap_single_vma+0x493/0x750
> [ 2360.849276]  [<ffffffff811100b0>] ? lru_deactivate_fn+0x1e0/0x1e0
> [ 2360.849279]  [<ffffffff810a4be0>] ? trace_hardirqs_on_caller+0x20/0x1d0
> [ 2360.849281]  [<ffffffff8112986c>] unmap_vmas+0x3c/0x60
> [ 2360.849284]  [<ffffffff81130de1>] exit_mmap+0x81/0x140
> [ 2360.849287]  [<ffffffff81043824>] mmput+0x74/0x130
> [ 2360.849289]  [<ffffffff8104a520>] exit_mm+0x100/0x120
> [ 2360.849292]  [<ffffffff8104c858>] do_exit+0x738/0x9a0
> [ 2360.849294]  [<ffffffff81049a26>] ? kmsg_dump+0x26/0x140
> [ 2360.849297]  [<ffffffff818ee0c0>] oops_end+0xb0/0xf0
> [ 2360.849299]  [<ffffffff81005a7b>] die+0x5b/0x90
> [ 2360.849301]  [<ffffffff818ed9a4>] do_trap+0xc4/0x170
> [ 2360.849304]  [<ffffffff810030a5>] do_invalid_op+0x95/0xb0
> [ 2360.849307]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.849317]  [<ffffffff813779dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
> [ 2360.849320]  [<ffffffff818ed260>] ? restore_args+0x30/0x30
> [ 2360.849322]  [<ffffffff818f674b>] invalid_op+0x1b/0x20
> [ 2360.849325]  [<ffffffffa00f62a1>] ? run_clustered_refs+0xa11/0xa20 [btrfs]
> [ 2360.849335]  [<ffffffffa00f5f2b>] ? run_clustered_refs+0x69b/0xa20 [btrfs]
> [ 2360.849346]  [<ffffffffa00f6479>] btrfs_run_delayed_refs+0x1c9/0x550 
> [btrfs]
> [ 2360.849356]  [<ffffffff810a4d15>] ? trace_hardirqs_on_caller+0x155/0x1d0
> [ 2360.849358]  [<ffffffffa00e306a>] ? btrfs_free_path+0x2a/0x40 [btrfs]
> [ 2360.849367]  [<ffffffffa015c741>] ? btrfs_run_delayed_items+0xf1/0x160 
> [btrfs]
> [ 2360.849380]  [<ffffffffa0108a15>] btrfs_commit_transaction+0x605/0xb00 
> [btrfs]
> [ 2360.849393]  [<ffffffff8109e70d>] ? lock_release_holdtime+0x3d/0x1c0
> [ 2360.849395]  [<ffffffffa013fc88>] ? btrfs_mksubvol+0x298/0x360 [btrfs]
> [ 2360.849409]  [<ffffffff8106d210>] ? wake_up_bit+0x40/0x40
> [ 2360.849411]  [<ffffffff8137d88e>] ? do_raw_spin_unlock+0x5e/0xb0
> [ 2360.849413]  [<ffffffffa013fd48>] btrfs_mksubvol+0x358/0x360 [btrfs]
> [ 2360.849427]  [<ffffffffa013fe5a>] 
> btrfs_ioctl_snap_create_transid+0x10a/0x190 [btrfs]
> [ 2360.849441]  [<ffffffffa014005d>] 
> btrfs_ioctl_snap_create_v2.clone.0+0xfd/0x110 [btrfs]
> [ 2360.849455]  [<ffffffffa01419ee>] btrfs_ioctl+0x48e/0x1340 [btrfs]
> [ 2360.849469]  [<ffffffff818f0f00>] ? do_page_fault+0x2d0/0x580
> [ 2360.849471]  [<ffffffff818eca70>] ? _raw_spin_unlock_irq+0x30/0x50
> [ 2360.849473]  [<ffffffff81078463>] ? finish_task_switch+0x83/0xf0
> [ 2360.849476]  [<ffffffff81161d08>] do_vfs_ioctl+0x98/0x560
> [ 2360.849478]  [<ffffffff818ed215>] ? retint_swapgs+0x13/0x1b
> [ 2360.849481]  [<ffffffff8116221f>] sys_ioctl+0x4f/0x80
> [ 2360.849483]  [<ffffffff818f56e9>] system_call_fastpath+0x16/0x1b
> 
> fs/btrfs/extent-tree.c:6047
> 
> 6046         if (parent > 0) {
> 6047                 BUG_ON(!(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF));
> 6048                 btrfs_set_extent_inline_ref_type(leaf, iref,
> 6049                                                  
> BTRFS_SHARED_BLOCK_REF_KEY);
> 6050                 btrfs_set_extent_inline_ref_offset(leaf, iref, parent);
> 6051         } else {
> 6052                 btrfs_set_extent_inline_ref_type(leaf, iref,
> 6053                                                  
> BTRFS_TREE_BLOCK_REF_KEY);
> 6054                 btrfs_set_extent_inline_ref_offset(leaf, iref, 
> root_objectid);
> 6055         }

This bug is similar to the one which is reported by Daniel J Blueman a month 
ago. And
Josef have fixed it, but the patch has not been merged into for-linus branch 
till now.
Did you applied that patch?

> 
> Currently for-linux hangs early during the test, so I applied V3 patches on 
> top
> of 3.5.
> 
> The filesystem is freshly created, the load is to simultaneously unpack large 
> tar,
> snapshot the fs, delete random snapshot, looped rm of the untarred dir. 
> Crashes after
> some minutes, reliably.

Could you send the test tool to me? I want to look into it.

Thanks
Miao

> 
> Fsck spits lots of errors:
> 
> ref mismatch on [1133031424 4096] extent item 1, found 0
> Backref 1133031424 root 5 not referenced back 0x7d1f40
> Incorrect global backref count on 1133031424 found 1 wanted 0
> backpointer mismatch on [1133031424 4096]
> owner ref check failed [1133031424 4096]
> 
> ref mismatch on [11213131776 16384] extent item 1, found 0
> Incorrect local backref count on 11213131776 root 5 owner 34509 offset 0 
> found 0 wanted 1 back 0x1424d8e0
> backpointer mismatch on [11213131776 16384]
> owner ref check failed [11213131776 16384]
> 
> fs tree 260 refs 6 not found
>         unresolved ref root 263 dir 256 index 4 namelen 14 name 
> snap2748615355 error 600
>         unresolved ref root 267 dir 256 index 4 namelen 14 name 
> snap2748615355 error 600
>         unresolved ref root 269 dir 256 index 4 namelen 14 name 
> snap2748615355 error 600
>         unresolved ref root 273 dir 256 index 4 namelen 14 name 
> snap2748615355 error 600
>         unresolved ref root 274 dir 256 index 4 namelen 14 name 
> snap2748615355 error 600
>         unresolved ref root 276 dir 256 index 4 namelen 14 name 
> snap2748615355 error 600
> 
> 
> I've asked Josef to pull those patches out of btrfs-next, feel free to send 
> me any testing
> version if you can't reproduce it on your side.
> 
> 
> david
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to