When initiating a stripe adding reshape, a deadlock between
md_stop_writes() waiting for the sync thread to stop and the
running sync thread waiting for inactive stripes occurs
(this frequently happens on single-core but rarely
 on multi-core systems).

Resolve by setting MD_RECOVERY_WAIT to request the main MD
resynchronization thread worker function md_do_sync() to bail
out when initiating the reshape via constructor arguments.
Don't set the flag when reloading without those arguments and
avoid superfluous mddev_{suspend,resume} setting up reshape.

Passes all lvm2 raid tests.

Signed-off-by: Heinz Mauelshagen <[email protected]>
---
 Documentation/device-mapper/dm-raid.txt |  1 +
 drivers/md/dm-raid.c                    | 13 ++++---------
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/Documentation/device-mapper/dm-raid.txt 
b/Documentation/device-mapper/dm-raid.txt
index f68d06d6f28b..efb73f521568 100644
--- a/Documentation/device-mapper/dm-raid.txt
+++ b/Documentation/device-mapper/dm-raid.txt
@@ -349,3 +349,4 @@ Version History
        state races.
 1.13.2  Fix raid redundancy validation and avoid keeping raid set frozen
 1.13.3  Fix reshape race on small devices
+1.14.0  Fix stripe adding reshape deadlock/potential data corruption
diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index ecb7706f7330..03dd915eff9e 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3871,14 +3871,13 @@ static int rs_start_reshape(struct raid_set *rs)
        struct mddev *mddev = &rs->md;
        struct md_personality *pers = mddev->pers;
 
+       /* Don't allow the sync thread to work until the table gets reloaded. */
+       set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
+
        r = rs_setup_reshape(rs);
        if (r)
                return r;
 
-       /* Need to be resumed to be able to start reshape, recovery is frozen 
until raid_resume() though */
-       if (test_and_clear_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags))
-               mddev_resume(mddev);
-
        /*
         * Check any reshape constraints enforced by the personalility
         *
@@ -3902,10 +3901,6 @@ static int rs_start_reshape(struct raid_set *rs)
                }
        }
 
-       /* Suspend because a resume will happen in raid_resume() */
-       set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags);
-       mddev_suspend(mddev);
-
        /*
         * Now reshape got set up, update superblocks to
         * reflect the fact so that a table reload will
@@ -4002,7 +3997,7 @@ static void raid_resume(struct dm_target *ti)
 
 static struct target_type raid_target = {
        .name = "raid",
-       .version = {1, 13, 3},
+       .version = {1, 14, 0},
        .module = THIS_MODULE,
        .ctr = raid_ctr,
        .dtr = raid_dtr,
-- 
2.17.1

--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel

Reply via email to