On Thu, Aug 22 2019,  Guilherme G. Piccoli  wrote:

> Currently md raid0/linear are not provided with any mechanism to validate
> if an array member got removed or failed. The driver keeps sending BIOs
> regardless of the state of array members, and kernel shows state 'clean'
> in the 'array_state' sysfs attribute. This leads to the following
> situation: if a raid0/linear array member is removed and the array is
> mounted, some user writing to this array won't realize that errors are
> happening unless they check dmesg or perform one fsync per written file.
> Despite udev signaling the member device is gone, 'mdadm' cannot issue the
> STOP_ARRAY ioctl successfully, given the array is mounted.
>
> In other words, no -EIO is returned and writes (except direct ones) appear
> normal. Meaning the user might think the wrote data is correctly stored in
> the array, but instead garbage was written given that raid0 does stripping
> (and so, it requires all its members to be working in order to not corrupt
> data). For md/linear, writes to the available members will work fine, but
> if the writes go to the missing member(s), it'll cause a file corruption
> situation, whereas the portion of the writes to the missing devices aren't
> written effectively.
>
> This patch changes this behavior: we check if the block device's gendisk
> is UP when submitting the BIO to the array member, and if it isn't, we flag
> the md device as MD_BROKEN and fail subsequent I/Os to that device; a read
> request to the array requiring data from a valid member is still completed.
> While flagging the device as MD_BROKEN, we also show a rate-limited warning
> in the kernel log.
>
> A new array state 'broken' was added too: it mimics the state 'clean' in
> every aspect, being useful only to distinguish if the array has some member
> missing. We rely on the MD_BROKEN flag to put the array in the 'broken'
> state. This state cannot be written in 'array_state' as it just shows
> one or more members of the array are missing but acts like 'clean', it
> wouldn't make sense to write it.
>
> With this patch, the filesystem reacts much faster to the event of missing
> array member: after some I/O errors, ext4 for instance aborts the journal
> and prevents corruption. Without this change, we're able to keep writing
> in the disk and after a machine reboot, e2fsck shows some severe fs errors
> that demand fixing. This patch was tested in ext4 and xfs filesystems, and
> requires a 'mdadm' counterpart to handle the 'broken' state.
>
> Cc: NeilBrown <[email protected]>
> Cc: Song Liu <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
>
> v2 -> v3:
> * Rebased against md-next.
>
> * Merged both patches in a single one (thanks Song for the
> suggestion); now we fail BIOs and mark array as MD_BROKEN
> when a member is missing. We rely in the MD_BROKEN flag
> to set array to 'broken' state.
>
> * Function is_missing_dev() was removed due to the above.
>
> v1 -> v2:
> * Added handling for md/linear 'broken' state;
> * Check for is_missing_dev() instead of personality (thanks Neil for
> the suggestion);
> * Changed is_missing_dev() handlers to static;
> * Print rate-limited warning in case of more members go away, not only
> the first.
>
> Cover-letter from v1:
> lore.kernel.org/linux-block/[email protected]
>
>
>  drivers/md/md-linear.c |  9 +++++++++
>  drivers/md/md.c        | 22 ++++++++++++++++++----
>  drivers/md/md.h        |  3 +++
>  drivers/md/raid0.c     | 10 ++++++++++
>  4 files changed, 40 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c
> index 7354466ddc90..0479ccdbdeeb 100644
> --- a/drivers/md/md-linear.c
> +++ b/drivers/md/md-linear.c
> @@ -258,6 +258,15 @@ static bool linear_make_request(struct mddev *mddev, 
> struct bio *bio)
>                    bio_sector < start_sector))
>               goto out_of_bounds;
>  
> +     if (unlikely(!(tmp_dev->rdev->bdev->bd_disk->flags & GENHD_FL_UP))) {
> +             if (!test_bit(MD_BROKEN, &mddev->flags))
> +                     pr_warn("md: %s: linear array has a missing/failed 
> member\n",
> +                             mdname(mddev));
> +             set_bit(MD_BROKEN, &mddev->flags);

It is a minor point, but I think this would look nicer as
   if (!test_and_set_bit(MD_BROKEN, ....) { .. . }


> +             bio_io_error(bio);
> +             return true;
> +     }
> +
>       if (unlikely(bio_end_sector(bio) > end_sector)) {
>               /* This bio crosses a device boundary, so we have to split it */
>               struct bio *split = bio_split(bio, end_sector - bio_sector,
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index b46bb143e3c5..e7612033005f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -376,6 +376,11 @@ static blk_qc_t md_make_request(struct request_queue *q, 
> struct bio *bio)
>       struct mddev *mddev = q->queuedata;
>       unsigned int sectors;
>  
> +     if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && (rw == WRITE)) {
> +             bio_io_error(bio);
> +             return BLK_QC_T_NONE;
> +     }
> +
>       blk_queue_split(q, &bio);
>  
>       if (mddev == NULL || mddev->pers == NULL) {
> @@ -4158,12 +4163,17 @@ __ATTR_PREALLOC(resync_start, S_IRUGO|S_IWUSR,
>   * active-idle
>   *     like active, but no writes have been seen for a while (100msec).
>   *
> + * broken
> + *     RAID0/LINEAR-only: same as clean, but array is missing a member.
> + *     It's useful because RAID0/LINEAR mounted-arrays aren't stopped
> + *     when a member is gone, so this state will at least alert the
> + *     user that something is wrong.
>   */
>  enum array_state { clear, inactive, suspended, readonly, read_auto, clean, 
> active,
> -                write_pending, active_idle, bad_word};
> +                write_pending, active_idle, broken, bad_word};
>  static char *array_states[] = {
>       "clear", "inactive", "suspended", "readonly", "read-auto", "clean", 
> "active",
> -     "write-pending", "active-idle", NULL };
> +     "write-pending", "active-idle", "broken", NULL };
>  
>  static int match_word(const char *word, char **list)
>  {
> @@ -4179,7 +4189,7 @@ array_state_show(struct mddev *mddev, char *page)
>  {
>       enum array_state st = inactive;
>  
> -     if (mddev->pers && !test_bit(MD_NOT_READY, &mddev->flags))
> +     if (mddev->pers && !test_bit(MD_NOT_READY, &mddev->flags)) {
>               switch(mddev->ro) {
>               case 1:
>                       st = readonly;
> @@ -4199,7 +4209,10 @@ array_state_show(struct mddev *mddev, char *page)
>                               st = active;
>                       spin_unlock(&mddev->lock);
>               }
> -     else {
> +
> +             if (unlikely(test_bit(MD_BROKEN, &mddev->flags)) && st == clean)

I prefer to keep "unlikely" for performance-sensitive code.  This is not
performance sensitive.

Even without those changes:
  Reviewed-by: NeilBrown <[email protected]>


Thanks,
NeilBrown

> +                     st = broken;
> +     } else {
>               if (list_empty(&mddev->disks) &&
>                   mddev->raid_disks == 0 &&
>                   mddev->dev_sectors == 0)
> @@ -4313,6 +4326,7 @@ array_state_store(struct mddev *mddev, const char *buf, 
> size_t len)
>               break;
>       case write_pending:
>       case active_idle:
> +     case broken:
>               /* these cannot be set */
>               break;
>       }
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 1edcd967eb8e..240de65e15e8 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -251,6 +251,9 @@ enum mddev_flags {
>       MD_NOT_READY,           /* do_md_run() is active, so 'array_state'
>                                * must not report that array is ready yet
>                                */
> +     MD_BROKEN,              /* This is used in RAID-0/LINEAR only, to stop
> +                              * I/O in case an array member is gone/failed.
> +                              */
>  };
>  
>  enum mddev_sb_flags {
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index bf5cf184a260..7772f5350bf2 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -586,6 +586,16 @@ static bool raid0_make_request(struct mddev *mddev, 
> struct bio *bio)
>  
>       zone = find_zone(mddev->private, &sector);
>       tmp_dev = map_sector(mddev, zone, sector, &sector);
> +
> +     if (unlikely(!(tmp_dev->bdev->bd_disk->flags & GENHD_FL_UP))) {
> +             if (!test_bit(MD_BROKEN, &mddev->flags))
> +                     pr_warn("md: %s: raid0 array has a missing/failed 
> member\n",
> +                             mdname(mddev));
> +             set_bit(MD_BROKEN, &mddev->flags);
> +             bio_io_error(bio);
> +             return true;
> +     }
> +
>       bio_set_dev(bio, tmp_dev->bdev);
>       bio->bi_iter.bi_sector = sector + zone->dev_start +
>               tmp_dev->data_offset;
> -- 
> 2.22.0

Attachment: signature.asc
Description: PGP signature

--
dm-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/dm-devel

Reply via email to