On Fri, May 20, 2016 at 12:44 AM,  <[email protected]> wrote:
> From: Filipe Manana <[email protected]>
>
> When it's finishing, the device replace code iterates all extent maps
> representing block group and for each one that has a stripe that refers
> to the source device, it replaces its device with the target device.
> However when it replaces the source device with the target device it,
> the target device still has an ID of 0ULL (BTRFS_DEV_REPLACE_DEVID),
> only after its ID is changed to match the one from the source device.
> This leads to races with the chunk removal code that can temporarly see
> a device with an ID of 0ULL and then attempt to use that ID to remove
> items from the device tree and fail, causing a transaction abort:
>
> [ 9238.594364] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 3) 
> to /dev/sde finished
> [ 9238.594377] ------------[ cut here ]------------
> [ 9238.594402] WARNING: CPU: 14 PID: 21566 at fs/btrfs/volumes.c:2771 
> btrfs_remove_chunk+0x2e5/0x793 [btrfs]
> [ 9238.594403] BTRFS: Transaction aborted (error 1)
> [ 9238.594416] Modules linked in: btrfs crc32c_generic acpi_cpufreq xor 
> tpm_tis tpm raid6_pq ppdev parport_pc processor psmouse parport i2c_piix4 
> evdev sg i2c_core se
> rio_raw pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom 
> sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio 
> e1000 scsi_mod fl
> oppy [last unloaded: btrfs]
> [ 9238.594418] CPU: 14 PID: 21566 Comm: btrfs-cleaner Not tainted 
> 4.6.0-rc7-btrfs-next-29+ #1
> [ 9238.594419] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by 
> qemu-project.org 04/01/2014
> [ 9238.594421]  0000000000000000 ffff88017f1dbc60 ffffffff8126b42c 
> ffff88017f1dbcb0
> [ 9238.594422]  0000000000000000 ffff88017f1dbca0 ffffffff81052b14 
> 00000ad37f1dbd18
> [ 9238.594423]  0000000000000001 ffff88018068a558 ffff88005c4b9c00 
> ffff880233f60db0
> [ 9238.594424] Call Trace:
> [ 9238.594428]  [<ffffffff8126b42c>] dump_stack+0x67/0x90
> [ 9238.594430]  [<ffffffff81052b14>] __warn+0xc2/0xdd
> [ 9238.594432]  [<ffffffff81052b7a>] warn_slowpath_fmt+0x4b/0x53
> [ 9238.594434]  [<ffffffff8116c311>] ? kmem_cache_free+0x128/0x188
> [ 9238.594450]  [<ffffffffa04d43f5>] btrfs_remove_chunk+0x2e5/0x793 [btrfs]
> [ 9238.594452]  [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc
> [ 9238.594464]  [<ffffffffa04a26fa>] btrfs_delete_unused_bgs+0x317/0x382 
> [btrfs]
> [ 9238.594476]  [<ffffffffa04a961d>] cleaner_kthread+0x1ad/0x1c7 [btrfs]
> [ 9238.594489]  [<ffffffffa04a9470>] ? btree_invalidatepage+0x8e/0x8e [btrfs]
> [ 9238.594490]  [<ffffffff8106f403>] kthread+0xd4/0xdc
> [ 9238.594494]  [<ffffffff8149e242>] ret_from_fork+0x22/0x40
> [ 9238.594495]  [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286
> [ 9238.594496] ---[ end trace 183efbe50275f059 ]---
>
> The sequence of steps leading to this is like the following:
>
>               CPU 1                                           CPU 2
>
>  btrfs_dev_replace_finishing()
>
>    at this point
>    dev_replace->tgtdev->devid ==
>    BTRFS_DEV_REPLACE_DEVID (0ULL)
>
>    ...
>
>    btrfs_start_transaction()
>    btrfs_commit_transaction()
>
>                                                      btrfs_delete_unused_bgs()
>                                                        btrfs_remove_chunk()
>
>                                                          looks up for the 
> extent map
>                                                          corresponding to the 
> chunk
>
>                                                          lock_chunks() 
> (chunk_mutex)
>                                                          check_system_chunk()
>                                                          unlock_chunks() 
> (chunk_mutex)
>
>    locks fs_info->chunk_mutex
>
>    btrfs_dev_replace_update_device_in_mapping_tree()
>      --> iterates fs_info->mapping_tree and
>          replaces the device in every extent
>          map's map->stripes[] with
>          dev_replace->tgtdev, which still has
>          an id of 0ULL (BTRFS_DEV_REPLACE_DEVID)
>
>                                                          iterates over all 
> stripes from
>                                                          the extent map
>
>                                                            --> calls 
> btrfs_free_dev_extent()
>                                                                passing it the 
> target device
>                                                                that still has 
> an ID of 0ULL
>
>                                                            --> 
> btrfs_free_dev_extent() fails
>                                                              --> aborts 
> current transaction
>
>    finishes setting up the target device,
>    namely it sets tgtdev->devid to the value
>    of srcdev->devid (which is necessarily > 0)
>
>    frees the srcdev
>
>    unlocks fs_info->chunk_mutex
>
> So fix this by taking the device list mutex while processing the stripes
> for the chunk's extent map. This is similar to the race between device
> replace and block group creation that was fixed by commit 50460e37186a
> ("Btrfs: fix race when finishing dev replace leading to transaction abort").
>
> Signed-off-by: Filipe Manana <[email protected]>


Reviewed-by: Josef Bacik <[email protected]>

Thanks,

Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to