On Sun, May 28, 2017 at 10:31:05PM +0100, fdman...@kernel.org wrote:
> From: Filipe Manana <fdman...@suse.com>
> 
> While punching a hole in a range that is not aligned with the sector size
> (currently the same as the page size) we can end up leaving an extent map
> in memory with a length that is smaller then the sector size, which is
> not expected and can lead to problems. This issue is easily detected
> after the patch from commit a7e3b975a0f9 ("Btrfs: fix reported number of
> inode blocks"), introduced in kernel 4.12-rc1, in a scenario like the
> following for example:
> 
>   $ mkfs.btrfs -f /dev/sdb
>   $ mount /dev/sdb /mnt
>   $ xfs_io -c "pwrite -S 0xaa -b 100K 0 100K" /mnt/foo
>   $ xfs_io -c "fpunch 60K 90K" /mnt/foo
>   $ xfs_io -c "pwrite -S 0xbb -b 100K 50K 100K" /mnt/foo
>   $ xfs_io -c "pwrite -S 0xcc -b 50K 100K 50K" /mnt/foo
>   $ umount /mnt
> 
> After the unmount operation we can see several warnings emmitted due to
> underflows related to space reservation counters:
> 
> [ 2837.443299] ------------[ cut here ]------------
> [ 2837.447395] WARNING: CPU: 8 PID: 2474 at fs/btrfs/inode.c:9444 
> btrfs_destroy_inode+0xe8/0x27e [btrfs]
> [ 2837.452108] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse 
> parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev 
> tpm button se
> rio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 raid456 
> async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
> libcrc32c crc32c_gene
> ric raid1 raid0 multipath linear md_mod sr_mod cdrom sd_mod ata_generic 
> virtio_scsi ata_piix libata virtio_pci virtio_ring virtio e1000 scsi_mod 
> floppy
> [ 2837.458389] CPU: 8 PID: 2474 Comm: umount Tainted: G        W       
> 4.10.0-rc8-btrfs-next-43+ #1
> [ 2837.459754] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [ 2837.462379] Call Trace:
> [ 2837.462379]  dump_stack+0x68/0x92
> [ 2837.462379]  __warn+0xc2/0xdd
> [ 2837.462379]  warn_slowpath_null+0x1d/0x1f
> [ 2837.462379]  btrfs_destroy_inode+0xe8/0x27e [btrfs]
> [ 2837.462379]  destroy_inode+0x3d/0x55
> [ 2837.462379]  evict+0x177/0x17e
> [ 2837.462379]  dispose_list+0x50/0x71
> [ 2837.462379]  evict_inodes+0x132/0x141
> [ 2837.462379]  generic_shutdown_super+0x3f/0xeb
> [ 2837.462379]  kill_anon_super+0x12/0x1c
> [ 2837.462379]  btrfs_kill_super+0x16/0x21 [btrfs]
> [ 2837.462379]  deactivate_locked_super+0x30/0x68
> [ 2837.462379]  deactivate_super+0x36/0x39
> [ 2837.462379]  cleanup_mnt+0x58/0x76
> [ 2837.462379]  __cleanup_mnt+0x12/0x14
> [ 2837.462379]  task_work_run+0x77/0x9b
> [ 2837.462379]  prepare_exit_to_usermode+0x9d/0xc5
> [ 2837.462379]  syscall_return_slowpath+0x196/0x1b9
> [ 2837.462379]  entry_SYSCALL_64_fastpath+0xab/0xad
> [ 2837.462379] RIP: 0033:0x7f3ef3e6b9a7
> [ 2837.462379] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 
> 00000000000000a6
> [ 2837.462379] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 
> 00007f3ef3e6b9a7
> [ 2837.462379] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
> 0000556f76a3f910
> [ 2837.462379] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 
> 0000000000000015
> [ 2837.462379] R10: 00000000000006b4 R11: 0000000000000246 R12: 
> 00007f3ef436ce64
> [ 2837.462379] R13: 0000000000000000 R14: 0000556f76a39240 R15: 
> 00007ffdd0d8e0e0
> [ 2837.519355] ---[ end trace e79345fe24b30b8d ]---
> [ 2837.596256] ------------[ cut here ]------------
> [ 2837.597625] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:5699 
> btrfs_free_block_groups+0x246/0x3eb [btrfs]
> [ 2837.603547] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse 
> parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev 
> tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 
> raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
> raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod 
> cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring 
> virtio e1000 scsi_mod floppy
> [ 2837.659372] CPU: 8 PID: 2474 Comm: umount Tainted: G        W       
> 4.10.0-rc8-btrfs-next-43+ #1
> [ 2837.663359] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [ 2837.663359] Call Trace:
> [ 2837.663359]  dump_stack+0x68/0x92
> [ 2837.663359]  __warn+0xc2/0xdd
> [ 2837.663359]  warn_slowpath_null+0x1d/0x1f
> [ 2837.663359]  btrfs_free_block_groups+0x246/0x3eb [btrfs]
> [ 2837.663359]  close_ctree+0x1dd/0x2e1 [btrfs]
> [ 2837.663359]  ? evict_inodes+0x132/0x141
> [ 2837.663359]  btrfs_put_super+0x15/0x17 [btrfs]
> [ 2837.663359]  generic_shutdown_super+0x6a/0xeb
> [ 2837.663359]  kill_anon_super+0x12/0x1c
> [ 2837.663359]  btrfs_kill_super+0x16/0x21 [btrfs]
> [ 2837.663359]  deactivate_locked_super+0x30/0x68
> [ 2837.663359]  deactivate_super+0x36/0x39
> [ 2837.663359]  cleanup_mnt+0x58/0x76
> [ 2837.663359]  __cleanup_mnt+0x12/0x14
> [ 2837.663359]  task_work_run+0x77/0x9b
> [ 2837.663359]  prepare_exit_to_usermode+0x9d/0xc5
> [ 2837.663359]  syscall_return_slowpath+0x196/0x1b9
> [ 2837.663359]  entry_SYSCALL_64_fastpath+0xab/0xad
> [ 2837.663359] RIP: 0033:0x7f3ef3e6b9a7
> [ 2837.663359] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 
> 00000000000000a6
> [ 2837.663359] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 
> 00007f3ef3e6b9a7
> [ 2837.663359] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
> 0000556f76a3f910
> [ 2837.663359] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 
> 0000000000000015
> [ 2837.663359] R10: 00000000000006b4 R11: 0000000000000246 R12: 
> 00007f3ef436ce64
> [ 2837.663359] R13: 0000000000000000 R14: 0000556f76a39240 R15: 
> 00007ffdd0d8e0e0
> [ 2837.739445] ---[ end trace e79345fe24b30b8e ]---
> [ 2837.745595] ------------[ cut here ]------------
> [ 2837.746412] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:5700 
> btrfs_free_block_groups+0x261/0x3eb [btrfs]
> [ 2837.747955] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse 
> parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev 
> tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 
> raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
> raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod 
> cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring 
> virtio e1000 scsi_mod floppy
> [ 2837.755395] CPU: 8 PID: 2474 Comm: umount Tainted: G        W       
> 4.10.0-rc8-btrfs-next-43+ #1
> [ 2837.756769] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [ 2837.758526] Call Trace:
> [ 2837.758925]  dump_stack+0x68/0x92
> [ 2837.759383]  __warn+0xc2/0xdd
> [ 2837.759383]  warn_slowpath_null+0x1d/0x1f
> [ 2837.759383]  btrfs_free_block_groups+0x261/0x3eb [btrfs]
> [ 2837.759383]  close_ctree+0x1dd/0x2e1 [btrfs]
> [ 2837.759383]  ? evict_inodes+0x132/0x141
> [ 2837.759383]  btrfs_put_super+0x15/0x17 [btrfs]
> [ 2837.759383]  generic_shutdown_super+0x6a/0xeb
> [ 2837.759383]  kill_anon_super+0x12/0x1c
> [ 2837.759383]  btrfs_kill_super+0x16/0x21 [btrfs]
> [ 2837.759383]  deactivate_locked_super+0x30/0x68
> [ 2837.759383]  deactivate_super+0x36/0x39
> [ 2837.759383]  cleanup_mnt+0x58/0x76
> [ 2837.759383]  __cleanup_mnt+0x12/0x14
> [ 2837.759383]  task_work_run+0x77/0x9b
> [ 2837.759383]  prepare_exit_to_usermode+0x9d/0xc5
> [ 2837.759383]  syscall_return_slowpath+0x196/0x1b9
> [ 2837.759383]  entry_SYSCALL_64_fastpath+0xab/0xad
> [ 2837.759383] RIP: 0033:0x7f3ef3e6b9a7
> [ 2837.759383] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 
> 00000000000000a6
> [ 2837.759383] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 
> 00007f3ef3e6b9a7
> [ 2837.759383] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
> 0000556f76a3f910
> [ 2837.759383] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 
> 0000000000000015
> [ 2837.759383] R10: 00000000000006b4 R11: 0000000000000246 R12: 
> 00007f3ef436ce64
> [ 2837.759383] R13: 0000000000000000 R14: 0000556f76a39240 R15: 
> 00007ffdd0d8e0e0
> [ 2837.777063] ---[ end trace e79345fe24b30b8f ]---
> [ 2837.778235] ------------[ cut here ]------------
> [ 2837.778856] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:9825 
> btrfs_free_block_groups+0x348/0x3eb [btrfs]
> [ 2837.791385] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse 
> parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev 
> tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 
> raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
> raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod 
> cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring 
> virtio e1000 scsi_mod floppy
> [ 2837.797711] CPU: 8 PID: 2474 Comm: umount Tainted: G        W       
> 4.10.0-rc8-btrfs-next-43+ #1
> [ 2837.798594] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [ 2837.800118] Call Trace:
> [ 2837.800515]  dump_stack+0x68/0x92
> [ 2837.801015]  __warn+0xc2/0xdd
> [ 2837.801471]  warn_slowpath_null+0x1d/0x1f
> [ 2837.801698]  btrfs_free_block_groups+0x348/0x3eb [btrfs]
> [ 2837.801698]  close_ctree+0x1dd/0x2e1 [btrfs]
> [ 2837.801698]  ? evict_inodes+0x132/0x141
> [ 2837.801698]  btrfs_put_super+0x15/0x17 [btrfs]
> [ 2837.801698]  generic_shutdown_super+0x6a/0xeb
> [ 2837.801698]  kill_anon_super+0x12/0x1c
> [ 2837.801698]  btrfs_kill_super+0x16/0x21 [btrfs]
> [ 2837.801698]  deactivate_locked_super+0x30/0x68
> [ 2837.801698]  deactivate_super+0x36/0x39
> [ 2837.801698]  cleanup_mnt+0x58/0x76
> [ 2837.801698]  __cleanup_mnt+0x12/0x14
> [ 2837.801698]  task_work_run+0x77/0x9b
> [ 2837.801698]  prepare_exit_to_usermode+0x9d/0xc5
> [ 2837.801698]  syscall_return_slowpath+0x196/0x1b9
> [ 2837.801698]  entry_SYSCALL_64_fastpath+0xab/0xad
> [ 2837.801698] RIP: 0033:0x7f3ef3e6b9a7
> [ 2837.801698] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 
> 00000000000000a6
> [ 2837.801698] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 
> 00007f3ef3e6b9a7
> [ 2837.801698] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
> 0000556f76a3f910
> [ 2837.801698] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 
> 0000000000000015
> [ 2837.801698] R10: 00000000000006b4 R11: 0000000000000246 R12: 
> 00007f3ef436ce64
> [ 2837.801698] R13: 0000000000000000 R14: 0000556f76a39240 R15: 
> 00007ffdd0d8e0e0
> [ 2837.818441] ---[ end trace e79345fe24b30b90 ]---
> [ 2837.818991] BTRFS info (device sdc): space_info 1 has 7974912 free, is not 
> full
> [ 2837.819830] BTRFS info (device sdc): space_info total=8388608, 
> used=417792, pinned=0, reserved=0, may_use=18446744073709547520, readonly=0
> [ 2837.821227] ------------[ cut here ]------------
> [ 2837.821897] WARNING: CPU: 8 PID: 2474 at fs/btrfs/extent-tree.c:9825 
> btrfs_free_block_groups+0x348/0x3eb [btrfs]
> [ 2837.823331] Modules linked in: dm_flakey dm_mod ppdev parport_pc psmouse 
> parport sg pcspkr acpi_cpufreq tpm_tis tpm_tis_core i2c_piix4 i2c_core evdev 
> tpm button serio_raw sunrpc loop autofs4 ext4 crc16 jbd2 mbcache btrfs raid10 
> raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor 
> raid6_pq libcrc32c crc32c_generic raid1 raid0 multipath linear md_mod sr_mod 
> cdrom sd_mod ata_generic virtio_scsi ata_piix libata virtio_pci virtio_ring 
> virtio e1000 scsi_mod floppy
> [ 2837.829575] CPU: 8 PID: 2474 Comm: umount Tainted: G        W       
> 4.10.0-rc8-btrfs-next-43+ #1
> [ 2837.830767] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
> [ 2837.832407] Call Trace:
> [ 2837.832820]  dump_stack+0x68/0x92
> [ 2837.833336]  __warn+0xc2/0xdd
> [ 2837.833561]  warn_slowpath_null+0x1d/0x1f
> [ 2837.833561]  btrfs_free_block_groups+0x348/0x3eb [btrfs]
> [ 2837.833561]  close_ctree+0x1dd/0x2e1 [btrfs]
> [ 2837.833561]  ? evict_inodes+0x132/0x141
> [ 2837.833561]  btrfs_put_super+0x15/0x17 [btrfs]
> [ 2837.833561]  generic_shutdown_super+0x6a/0xeb
> [ 2837.833561]  kill_anon_super+0x12/0x1c
> [ 2837.833561]  btrfs_kill_super+0x16/0x21 [btrfs]
> [ 2837.833561]  deactivate_locked_super+0x30/0x68
> [ 2837.833561]  deactivate_super+0x36/0x39
> [ 2837.833561]  cleanup_mnt+0x58/0x76
> [ 2837.833561]  __cleanup_mnt+0x12/0x14
> [ 2837.833561]  task_work_run+0x77/0x9b
> [ 2837.833561]  prepare_exit_to_usermode+0x9d/0xc5
> [ 2837.833561]  syscall_return_slowpath+0x196/0x1b9
> [ 2837.833561]  entry_SYSCALL_64_fastpath+0xab/0xad
> [ 2837.833561] RIP: 0033:0x7f3ef3e6b9a7
> [ 2837.833561] RSP: 002b:00007ffdd0d8de58 EFLAGS: 00000246 ORIG_RAX: 
> 00000000000000a6
> [ 2837.833561] RAX: 0000000000000000 RBX: 0000556f76a39060 RCX: 
> 00007f3ef3e6b9a7
> [ 2837.833561] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 
> 0000556f76a3f910
> [ 2837.833561] RBP: 0000556f76a3f910 R08: 0000556f76a3e670 R09: 
> 0000000000000015
> [ 2837.833561] R10: 00000000000006b4 R11: 0000000000000246 R12: 
> 00007f3ef436ce64
> [ 2837.833561] R13: 0000000000000000 R14: 0000556f76a39240 R15: 
> 00007ffdd0d8e0e0
> [ 2837.858288] ---[ end trace e79345fe24b30b91 ]---
> [ 2837.858829] BTRFS info (device sdc): space_info 4 has 1073328128 free, is 
> not full
> [ 2837.859721] BTRFS info (device sdc): space_info total=1073741824, 
> used=28672, pinned=0, reserved=0, may_use=319488, readonly=65536
> 
> What happens in the above example is the following:
> 
> 1) When punching the hole, at btrfs_punch_hole(), the variable tail_len
>    is set to 2048 (as tail_start is 148Kb + 1 and offset + len is 150Kb).
>    This results in the creation of an extent map with a length of 2Kb
>    starting at file offset 148Kb, through find_first_non_hole() ->
>    btrfs_get_extent().
> 
> 2) The second write (first write after the hole punch operation), sets
>    the range [50Kb, 152Kb[ to delalloc.
> 
> 3) The third write, at btrfs_find_new_delalloc_bytes(), sees the extent
>    map covering the range [148Kb, 150Kb[ and ends up calling
>    set_extent_bit() for the same range, which results in splitting an
>    existing extent state record, covering the range [148Kb, 152Kb[ into
>    two 2Kb extent state records, covering the ranges [148Kb, 150Kb[ and
>    [150Kb, 152Kb[.
> 
> 4) Finally at lock_and_cleanup_extent_if_need(), immediately after calling
>    btrfs_find_new_delalloc_bytes() we clear the delalloc bit from the
>    range [100Kb, 152Kb[ which results in the btrfs_clear_bit_hook()
>    callback being invoked against the two 2Kb extent state records that
>    cover the ranges [148Kb, 150Kb[ and [150Kb, 152Kb[. When called against
>    the first 2Kb extent state, it calls btrfs_delalloc_release_metadata()
>    with a length argument of 2048 bytes. That function rounds up the length
>    to a sector size aligned length, so it ends up considering a length of
>    4096 bytes, and then calls calc_csum_metadata_size() which results in
>    decrementing the inode's csum_bytes counter by 4096 bytes, so after
>    it stays a value of 0 bytes. Then the same happens when
>    btrfs_clear_bit_hook() is called against the second extent state that
>    has a length of 2Kb, covering the range [150Kb, 152Kb[, the length is
>    rounded up to 4096 and calc_csum_metadata_size() ends up being called
>    to decrement 4096 bytes from the inode's csum_bytes counter, which
>    at that time has a value of 0, leading to an underflow, which is
>    exactly what triggers the first warning, at btrfs_destroy_inode().
>    All the other warnings relate to several space accounting counters
>    that underflow as well due to similar reasons.
> 
> So fix the hole punching operation to make sure it never creates extent
> maps with a length that is not aligned to the sector size, as this breaks
> all assumptions and it's a land mine.
> 
> Fixes: d77815461f04 ("btrfs: Avoid trucating page or punching hole in a 
> already existed hole.")
> Cc: <sta...@vger.kernel.org>
> Signed-off-by: Filipe Manana <fdman...@suse.com>
> ---
> 
> V2: Rebased on latest for-linus-4.12 branch from Chris, so that it
>     applies cleanly.
> 
>  fs/btrfs/file.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index da1096eb1a40..928fe290e834 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2390,10 +2390,12 @@ static int fill_holes(struct btrfs_trans_handle 
> *trans,
>   */
>  static int find_first_non_hole(struct inode *inode, u64 *start, u64 *len)
>  {
> +     struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
>       struct extent_map *em;
>       int ret = 0;
>  
> -     em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, *start, *len, 0);
> +     em = btrfs_get_extent(BTRFS_I(inode), NULL, 0, *start,
> +                           round_up(*len, fs_info->sectorsize), 0);

Sometime ago I found that punch hole can create unaligned extent map
but I didn't have a case to prove it'd cause problem, thanks for
catching it.

Why not make btrfs_get_extent() to always return aligned extent map
since every callers follow the rule except this punch hole?

Thanks,
-liubo
>       if (IS_ERR(em))
>               return PTR_ERR(em);
>  
> -- 
> 2.11.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to