On Mon, May 20, 2024 at 3:27 PM Yu Kuai <[email protected]> wrote:
>
> Hi,
>
> 在 2024/05/20 10:55, Yu Kuai 写道:
> > Hi, Changhui
> >
> > 在 2024/05/20 8:39, Changhui Zhong 写道:
> >> [czhong@vm linux-block]$ git bisect bad
> >> 060406c61c7cb4bbd82a02d179decca9c9bb3443 is the first bad commit
> >> commit 060406c61c7cb4bbd82a02d179decca9c9bb3443
> >> Author: Yu Kuai<[email protected]>
> >> Date: Thu May 9 20:38:25 2024 +0800
> >>
> >> block: add plug while submitting IO
> >>
> >> So that if caller didn't use plug, for example,
> >> __blkdev_direct_IO_simple()
> >> and __blkdev_direct_IO_async(), block layer can still benefit
> >> from caching
> >> nsec time in the plug.
> >>
> >> Signed-off-by: Yu Kuai<[email protected]>
> >>
> >> Link:https://lore.kernel.org/r/[email protected]
> >>
> >> Signed-off-by: Jens Axboe<[email protected]>
> >>
> >> block/blk-core.c | 6 ++++++
> >> 1 file changed, 6 insertions(+)
> >
> > Thanks for the test!
> >
> > I was surprised to see this blamed commit, and after taking a look at
> > raid1 barrier code, I found that there are some known problems, fixed in
> > raid10, while raid1 still unfixed. So I wonder this patch maybe just
> > making the exist problem easier to reporduce.
> >
> > I'll start cooking patches to sync raid10 fixes to raid1, meanwhile,
> > can you change your script to test raid10 as well, if raid10 is fine,
> > I'll give you these patches later to test raid1.
>
> Hi,
>
> Sorry to ask, but since I can't reporduce the problem, and based on
> code reiview, there are multiple potential problems, can you also
> reporduce the problem with following debug patch(just add some debug
> info, no functional changes). So that I can make sure of details of
> the problem.
>
Hi,Kuai
yeah, I can test your patch,
but I hit a problem when applying the patch, please help check it, and
I will test it again after you fix it.
```
patching file drivers/md/raid1.c
patch: **** malformed patch at line 42: idx, nr, RESYNC_DEPTH);
```
Thanks,
Changhui
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 113135e7b5f2..b35b847a9e8b 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -936,6 +936,45 @@ static void flush_pending_writes(struct r1conf *conf)
> spin_unlock_irq(&conf->device_lock);
> }
>
> +static bool waiting_barrier(struct r1conf *conf, int idx)
> +{
> + int nr = atomic_read(&conf->nr_waiting[idx]);
> +
> + if (nr) {
> + printk("%s: idx %d nr_waiting %d\n", __func__, idx, nr);
> + return true;
> + }
> +
> + return false;
> +}
> +
> +static bool waiting_pending(struct r1conf *conf, int idx)
> +{
> + int nr;
> +
> + if (test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery))
> + return false;
> +
> + if (conf->array_frozen) {
> + printk("%s: array is frozen\n", __func__);
> + return true;
> + }
> +
> + nr = atomic_read(&conf->nr_pending[idx]);
> + if (nr) {
> + printk("%s: idx %d nr_pending %d\n", __func__, idx, nr);
> + return true;
> + }
> +
> + nr = atomic_read(&conf->barrier[idx]);
> + if (nr >= RESYNC_DEPTH) {
> + printk("%s: idx %d barrier %d exceeds %d\n", __func__,
> idx, nr, RESYNC_DEPTH);
> + return true;
> + }
> +
> + return false;
> +}
> +
> /* Barriers....
> * Sometimes we need to suspend IO while we do something else,
> * either some resync/recovery, or reconfigure the array.
> @@ -967,8 +1006,7 @@ static int raise_barrier(struct r1conf *conf,
> sector_t sector_nr)
> spin_lock_irq(&conf->resync_lock);
>
> /* Wait until no block IO is waiting */
> - wait_event_lock_irq(conf->wait_barrier,
> - !atomic_read(&conf->nr_waiting[idx]),
> + wait_event_lock_irq(conf->wait_barrier, !waiting_barrier(conf, idx),
> conf->resync_lock);
>
> /* block any new IO from starting */
> @@ -990,11 +1028,7 @@ static int raise_barrier(struct r1conf *conf,
> sector_t sector_nr)
> * C: while conf->barrier[idx] >= RESYNC_DEPTH, meaning reaches
> * max resync count which allowed on current I/O barrier bucket.
> */
> - wait_event_lock_irq(conf->wait_barrier,
> - (!conf->array_frozen &&
> - !atomic_read(&conf->nr_pending[idx]) &&
> - atomic_read(&conf->barrier[idx]) <
> RESYNC_DEPTH) ||
> - test_bit(MD_RECOVERY_INTR,
> &conf->mddev->recovery),
> + wait_event_lock_irq(conf->wait_barrier, !waiting_pending(conf, idx),
> conf->resync_lock);
>
> if (test_bit(MD_RECOVERY_INTR, &conf->mddev->recovery)) {
>
> Thanks,
> Kuai
>
> >
> > Thanks,
> > Kuai
> >
> > .
> >
>