On Wed, Nov 21, 2018 at 05:10:52PM +0200, Nikolay Borisov wrote:
> Running btrfs/124 in a loop hung up on me sporadically with the
> following call trace:
>       btrfs           D    0  5760   5324 0x00000000
>       Call Trace:
>        ? __schedule+0x243/0x800
>        schedule+0x33/0x90
>        btrfs_start_ordered_extent+0x10c/0x1b0 [btrfs]
>        ? wait_woken+0xa0/0xa0
>        btrfs_wait_ordered_range+0xbb/0x100 [btrfs]
>        btrfs_relocate_block_group+0x1ff/0x230 [btrfs]
>        btrfs_relocate_chunk+0x49/0x100 [btrfs]
>        btrfs_balance+0xbeb/0x1740 [btrfs]
>        btrfs_ioctl_balance+0x2ee/0x380 [btrfs]
>        btrfs_ioctl+0x1691/0x3110 [btrfs]
>        ? lockdep_hardirqs_on+0xed/0x180
>        ? __handle_mm_fault+0x8e7/0xfb0
>        ? _raw_spin_unlock+0x24/0x30
>        ? __handle_mm_fault+0x8e7/0xfb0
>        ? do_vfs_ioctl+0xa5/0x6e0
>        ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
>        do_vfs_ioctl+0xa5/0x6e0
>        ? entry_SYSCALL_64_after_hwframe+0x3e/0xbe
>        ksys_ioctl+0x3a/0x70
>        __x64_sys_ioctl+0x16/0x20
>        do_syscall_64+0x60/0x1b0
>        entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> This happens because during page writeback it's valid for
> writepage_delalloc to instantiate a delalloc range which doesn't
> belong to the page currently being written back.
> 
> The reason this case is valid is due to find_lock_delalloc_range
> returning any available range after the passed delalloc_start and
> ignorting whether the page under writeback is within that range.
> In turn ordered extents (OE) are always created for the returned range
> from find_lock_delalloc_range. If, however, a failure occurs while OE
> are being created then the clean up code in btrfs_cleanup_ordered_extents
> will be called.
> 
> Unfortunately the code in btrfs_cleanup_ordered_extents doesn't consider
> the case of such 'foreign' range being processed and instead it always
> assumes that the range OE are created for belongs to the page. This
> leads to the first page of such foregin range to not be cleaned up since
> it's deliberately missed skipped by the current cleaning up code.
> 
> Fix this by correctly checking whether the current page belongs to the
> range being instantiated and if so adjsut the range parameters
> passed for cleaning up. If it doesn't, then just clean the whole OE
> range directly.
> 
> Signed-off-by: Nikolay Borisov <nbori...@suse.com>
> Reviewed-by: Josef Bacik <jo...@toxicpanda.com>

Added to misc-next, thanks.

Reply via email to