On Sun, Nov 11, 2018 at 10:22:15PM +0800, Anand Jain wrote: > v1->v2: > 2/9: Drop writeback required > 3/9: Drop writeback required > 7/9: Use the condition within the WARN_ON() > 6/9: Use the condition within the ASSERT() > > Replace-start and replace-cancel threads can race to create a messy > situation leading to UAF. We use the scrub code to write > the blocks on the replace target. So if we haven't have set the > replace-scrub-running yet, without this patch we just ignore the error > and free the target device. When this happens the system panics with > UAF error. > > Its nice to see that btrfs_dev_replace_finishing() already handles > the ECANCELED (replace canceled) situation, but for an unknown reason > we aren't using it to cleanup the replace cancel situation, instead > we just let the replace cancel ioctl thread to cleanup the target > device and return and out of synchronous with the scrub code. > > This patch 4/9, 5/9 and 6/9 uses the return code of btrfs_scrub_cancel() > to check if the scrub was really running. And if its not then shall > return an error to the user (replace not started error) so that user > can retry replace cancel. And uses btrfs_dev_replace_finishing() code > to cleanup after successful cancel of the replace scrub. > > Further, a suspended replace, when tries to restart, and if it fails > (for example target device missing, or excl ops running) it goes to the > started state, and so the cli 'btrfs replace status /mnt' hangs with no > progress. So patches 2/9 and 3/9 fixes that. > > As the originals code idea of ECANCELED was limited to the situation of > the error only and not user requested, there are unnecessary error log > and warn log which 7/9 and 8/9 patches fixes.
This fixes quite a few problems, namely the crash in scrub_setup_ctx, thanks. I'm going to add the patchset to for-next, the code looks good on first glance.