On Fri, May 20, 2016 at 12:44 AM, <[email protected]> wrote: > From: Filipe Manana <[email protected]> > > When it's finishing, the device replace code iterates all extent maps > representing block group and for each one that has a stripe that refers > to the source device, it replaces its device with the target device. > However when it replaces the source device with the target device it, > the target device still has an ID of 0ULL (BTRFS_DEV_REPLACE_DEVID), > only after its ID is changed to match the one from the source device. > This leads to races with the chunk removal code that can temporarly see > a device with an ID of 0ULL and then attempt to use that ID to remove > items from the device tree and fail, causing a transaction abort: > > [ 9238.594364] BTRFS info (device sdf): dev_replace from /dev/sdf (devid 3) > to /dev/sde finished > [ 9238.594377] ------------[ cut here ]------------ > [ 9238.594402] WARNING: CPU: 14 PID: 21566 at fs/btrfs/volumes.c:2771 > btrfs_remove_chunk+0x2e5/0x793 [btrfs] > [ 9238.594403] BTRFS: Transaction aborted (error 1) > [ 9238.594416] Modules linked in: btrfs crc32c_generic acpi_cpufreq xor > tpm_tis tpm raid6_pq ppdev parport_pc processor psmouse parport i2c_piix4 > evdev sg i2c_core se > rio_raw pcspkr button loop autofs4 ext4 crc16 jbd2 mbcache sr_mod cdrom > sd_mod ata_generic virtio_scsi ata_piix virtio_pci libata virtio_ring virtio > e1000 scsi_mod fl > oppy [last unloaded: btrfs] > [ 9238.594418] CPU: 14 PID: 21566 Comm: btrfs-cleaner Not tainted > 4.6.0-rc7-btrfs-next-29+ #1 > [ 9238.594419] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by > qemu-project.org 04/01/2014 > [ 9238.594421] 0000000000000000 ffff88017f1dbc60 ffffffff8126b42c > ffff88017f1dbcb0 > [ 9238.594422] 0000000000000000 ffff88017f1dbca0 ffffffff81052b14 > 00000ad37f1dbd18 > [ 9238.594423] 0000000000000001 ffff88018068a558 ffff88005c4b9c00 > ffff880233f60db0 > [ 9238.594424] Call Trace: > [ 9238.594428] [<ffffffff8126b42c>] dump_stack+0x67/0x90 > [ 9238.594430] [<ffffffff81052b14>] __warn+0xc2/0xdd > [ 9238.594432] [<ffffffff81052b7a>] warn_slowpath_fmt+0x4b/0x53 > [ 9238.594434] [<ffffffff8116c311>] ? kmem_cache_free+0x128/0x188 > [ 9238.594450] [<ffffffffa04d43f5>] btrfs_remove_chunk+0x2e5/0x793 [btrfs] > [ 9238.594452] [<ffffffff8108e456>] ? arch_local_irq_save+0x9/0xc > [ 9238.594464] [<ffffffffa04a26fa>] btrfs_delete_unused_bgs+0x317/0x382 > [btrfs] > [ 9238.594476] [<ffffffffa04a961d>] cleaner_kthread+0x1ad/0x1c7 [btrfs] > [ 9238.594489] [<ffffffffa04a9470>] ? btree_invalidatepage+0x8e/0x8e [btrfs] > [ 9238.594490] [<ffffffff8106f403>] kthread+0xd4/0xdc > [ 9238.594494] [<ffffffff8149e242>] ret_from_fork+0x22/0x40 > [ 9238.594495] [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286 > [ 9238.594496] ---[ end trace 183efbe50275f059 ]--- > > The sequence of steps leading to this is like the following: > > CPU 1 CPU 2 > > btrfs_dev_replace_finishing() > > at this point > dev_replace->tgtdev->devid == > BTRFS_DEV_REPLACE_DEVID (0ULL) > > ... > > btrfs_start_transaction() > btrfs_commit_transaction() > > btrfs_delete_unused_bgs() > btrfs_remove_chunk() > > looks up for the > extent map > corresponding to the > chunk > > lock_chunks() > (chunk_mutex) > check_system_chunk() > unlock_chunks() > (chunk_mutex) > > locks fs_info->chunk_mutex > > btrfs_dev_replace_update_device_in_mapping_tree() > --> iterates fs_info->mapping_tree and > replaces the device in every extent > map's map->stripes[] with > dev_replace->tgtdev, which still has > an id of 0ULL (BTRFS_DEV_REPLACE_DEVID) > > iterates over all > stripes from > the extent map > > --> calls > btrfs_free_dev_extent() > passing it the > target device > that still has > an ID of 0ULL > > --> > btrfs_free_dev_extent() fails > --> aborts > current transaction > > finishes setting up the target device, > namely it sets tgtdev->devid to the value > of srcdev->devid (which is necessarily > 0) > > frees the srcdev > > unlocks fs_info->chunk_mutex > > So fix this by taking the device list mutex while processing the stripes > for the chunk's extent map. This is similar to the race between device > replace and block group creation that was fixed by commit 50460e37186a > ("Btrfs: fix race when finishing dev replace leading to transaction abort"). > > Signed-off-by: Filipe Manana <[email protected]>
Reviewed-by: Josef Bacik <[email protected]> Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
