On 10/11/2018 05:13 PM, David Sterba wrote:
> On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote:
>> This patch set contains an additional fix for a newly exposed bug after
>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>
>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c...@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
>>
>> The DUP fix is "fix more DUP stripe size handling". I did that one
>> before starting to change more things so it can be applied to earlier
>> LTS kernels.
>>
>> Besides that patch, which is fixing the bug in a way that is least
>> intrusive, I added a bunch of other patches to help getting the chunk
>> allocator code in a state that is a bit less error-prone and
>> bug-attracting.
>>
>> When running this and trying the reproduction scenario, I can now see
>> that the created DUP device extent is 827326464 bytes long, which is
>> good.
>>
>> I wrote and tested this on top of linus 4.19-rc5. I still need to create
>> a list of related use cases and test more things to at least walk
>> through a bunch of obvious use cases to see if there's nothing exploding
>> too quickly with these changes. However, I'm quite confident about it,
>> so I'm sharing all of it already.
>>
>> Any feedback and review is appreciated. Be gentle and keep in mind that
>> I'm still very much in a learning stage regarding kernel development.
> 
> The patches look good, thanks. Problem is explained, preparatory work is
> separated from the fix itself.

\o/

>> The stable patches handling workflow is not 100% clear to me yet. I
>> guess I have to add a Fixes: in the DUP patch which points to the
>> previous commit 92e222df7b.
> 
> Almost nobody does it right, no worries. If you can identify a single
> patch that introduces a bug then it's for Fixes:, otherwise a CC: stable
> with version where it makes sense & applies is enough. I do that check
> myself regardless of what's in the patch.

It's 92e222df7b and the thing I'm not sure about is if it also will
catch the same patch which was already backported to LTS kernels since
92e222df7b also has Fixes in it... So by now the new bug is in 4.19,
4.14, 4.9, 4.4, 3.16...

> I ran the patches in a VM and hit a division-by-zero in test
> fstests/btrfs/011, stacktrace below. First guess is that it's caused by
> patch 3/6.

Ah interesting, dev replace.

I'll play around with replace and find out how to run the tests properly
and then reproduce this.

The code introduced in patch 3 is removed again in patch 6, so I don't
suspect that one. :)

But, I'll find out.

Thanks for testing.

Hans

> [ 3116.065595] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 
> 1 transid 5 /dev/vdb
> [ 3116.071274] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 
> 2 transid 5 /dev/vdc
> [ 3116.087086] BTRFS info (device vdb): disk space caching is enabled
> [ 3116.088644] BTRFS info (device vdb): has skinny extents
> [ 3116.089796] BTRFS info (device vdb): flagging fs with big metadata feature
> [ 3116.093971] BTRFS info (device vdb): checking UUID tree
> [ 3125.853755] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) 
> to /dev/vdd started
> [ 3125.860269] divide error: 0000 [#1] PREEMPT SMP
> [ 3125.861264] CPU: 1 PID: 6477 Comm: btrfs Not tainted 4.19.0-rc7-default+ 
> #288
> [ 3125.862841] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> 1.0.0-prebuilt.qemu-project.org 04/01/2014
> [ 3125.865385] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
> [ 3125.870541] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
> [ 3125.871862] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 
> 0000000000000002
> [ 3125.873587] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 
> 0000000002000000
> [ 3125.874677] RBP: 0000000080800000 R08: 0000000000000000 R09: 
> 0000000000000002
> [ 3125.875816] R10: 0000000300000000 R11: 0000000080900000 R12: 
> 0000000000000000
> [ 3125.876742] R13: 0000000000000001 R14: 0000000000000001 R15: 
> 0000000000000002
> [ 3125.877657] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) 
> knlGS:0000000000000000
> [ 3125.878862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3125.880080] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 
> 00000000000006e0
> [ 3125.881485] Call Trace:
> [ 3125.882105]  do_chunk_alloc+0x266/0x3e0 [btrfs]
> [ 3125.882841]  btrfs_inc_block_group_ro+0x10e/0x160 [btrfs]
> [ 3125.883875]  scrub_enumerate_chunks+0x18b/0x5d0 [btrfs]
> [ 3125.884658]  ? is_module_address+0x11/0x30
> [ 3125.885271]  ? wait_for_completion+0x160/0x190
> [ 3125.885928]  btrfs_scrub_dev+0x1b8/0x5a0 [btrfs]
> [ 3125.887767]  ? start_transaction+0xa1/0x470 [btrfs]
> [ 3125.888648]  btrfs_dev_replace_start.cold.19+0x155/0x17e [btrfs]
> [ 3125.889459]  btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs]
> [ 3125.890251]  btrfs_ioctl+0x2a94/0x31d0 [btrfs]
> [ 3125.890885]  ? do_sigaction+0x7c/0x210
> [ 3125.891731]  ? do_vfs_ioctl+0xa2/0x6b0
> [ 3125.892652]  do_vfs_ioctl+0xa2/0x6b0
> [ 3125.893642]  ? do_sigaction+0x1a7/0x210
> [ 3125.894665]  ksys_ioctl+0x3a/0x70
> [ 3125.895523]  __x64_sys_ioctl+0x16/0x20
> [ 3125.896339]  do_syscall_64+0x5a/0x1a0
> [ 3125.896949]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> [ 3125.897638] RIP: 0033:0x7f6de28ecaa7
> [ 3125.901313] RSP: 002b:00007ffe963da9c8 EFLAGS: 00000246 ORIG_RAX: 
> 0000000000000010
> [ 3125.902486] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 
> 00007f6de28ecaa7
> [ 3125.903538] RDX: 00007ffe963dae00 RSI: 00000000ca289435 RDI: 
> 0000000000000003
> [ 3125.904878] RBP: 0000000000000000 R08: 0000000000000000 R09: 
> 0000000000000000
> [ 3125.905788] R10: 0000000000000008 R11: 0000000000000246 R12: 
> 00007ffe963de26f
> [ 3125.906700] R13: 0000000000000001 R14: 0000000000000004 R15: 
> 000055fceeceb2a0
> [ 3125.907954] Modules linked in: btrfs libcrc32c xor zstd_decompress 
> zstd_compress xxhash raid6_pq loop
> [ 3125.909342] ---[ end trace 5492bb467d3be2da ]---
> [ 3125.910031] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
> [ 3125.913600] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
> [ 3125.914595] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 
> 0000000000000002
> [ 3125.916209] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 
> 0000000002000000
> [ 3125.917701] RBP: 0000000080800000 R08: 0000000000000000 R09: 
> 0000000000000002
> [ 3125.919209] R10: 0000000300000000 R11: 0000000080900000 R12: 
> 0000000000000000
> [ 3125.920782] R13: 0000000000000001 R14: 0000000000000001 R15: 
> 0000000000000002
> [ 3125.922413] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) 
> knlGS:0000000000000000
> [ 3125.924264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3125.925627] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 
> 00000000000006e0
> 


-- 
Hans van Kranenburg

Reply via email to