On Wed, Nov 27, 2024 at 11:11 PM Kent Overstreet
<[email protected]> wrote:
> On Wed, Nov 27, 2024 at 10:48:03PM +0100, Jann Horn wrote:
> > On Wed, Nov 27, 2024 at 10:36 PM Kent Overstreet
> > <[email protected]> wrote:
> > > When invoked from aio, mm_struct is guaranteed to outlive the request
> > > since its lifetime is tied to the io_context - but that's not the case
> > > for io_uring, it's possible that a process could be killed and mm_struct
> > > goes away while a request is in flight.
> > >
> > > So if we're submitting the rest of the io asynchronously, we may need a
> > > ref on mm_struct.
> > >
> > > Per Jens, this is not actually a bug because we're not yet flipping on
> > > FMODE_NOWAIT, meaning io_uring will do the submission from an io_worker
> > > kthread - but this patch is necessary for safely flipping on
> > > FMODE_NOWAIT for more efficient submissions in the future.
> >
> > Ah, one thing to look out for is that mm_struct has two types of
> > refcounting, this patch mixes them up - I don't think my explanation
> > of that was very clear:
> >
> >  - mmgrab()/mmdrop() prevent the mm_struct itself being freed, but
> > don't properly keep the mappings in it alive
> >  - mmget()/mmput() fully keep the MM alive
> >
> > An mmget() reference automatically also holds an mmgrab() reference, 
> > basically.
> >
> > For kthread_use_mm(), you need an mmget() reference; so the most
> > straightforward option would be to mmget() when creating the
> > asynchronous work and mmput() when you're done. But that has the
> > disadvantage that the reference keeps the entire mm_struct with all
> > the mappings inside it alive, and it even indirectly holds a reference
> > to all the files mapped in the MM; though I don't know if that's
> > really a problem here.
> >
> > If you want to avoid keeping the MM of an exiting process alive while
> > the I/O is running, the neater pattern is to use mmgrab()/mmdrop() for
> > your long-term reference, and then use mmget_not_zero()/mmput() around
> > kthread_use_mm()/kthread_unuse_mm() (and bail if mmget_not_zero()
> > fails). You can see this pattern in places like
> > vfio_iommu_type1_dma_rw_chunk().
>
> gotcha...
>
> From 3ef3067690d838dfc9a12686caa015e85e217f01 Mon Sep 17 00:00:00 2001
> From: Kent Overstreet <[email protected]>
> Date: Wed, 27 Nov 2024 16:32:26 -0500
> Subject: [PATCH] bcachefs: dio write: Take ref on mm_struct when using
>  asynchronously
>
> When invoked from aio, mm_struct is guaranteed to outlive the request
> since its lifetime is tied to the io_context - but that's not the case
> for io_uring, it's possible that a process could be killed and mm_struct
> goes away while a request is in flight.
>
> So if we're submitting the rest of the io asynchronously, we may need a
> ref on mm_struct.
>
> Per Jens, this is not actually a bug because we're not yet flipping on
> FMODE_NOWAIT, meaning io_uring will do the submission from an io_worker
> kthread - but this patch is necessary for safely flipping on
> FMODE_NOWAIT for more efficient submissions in the future.
>
> Reported-by: Jann Horn <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Signed-off-by: Kent Overstreet <[email protected]>
>
> diff --git a/fs/bcachefs/fs-io-direct.c b/fs/bcachefs/fs-io-direct.c
> index 2089c36b5866..8da0577411e8 100644
> --- a/fs/bcachefs/fs-io-direct.c
> +++ b/fs/bcachefs/fs-io-direct.c
> @@ -226,6 +226,7 @@ struct dio_write {
>         struct mm_struct                *mm;
>         const struct iovec              *iov;
>         unsigned                        loop:1,
> +                                       have_mm_ref:1,
>                                         extending:1,
>                                         sync:1,
>                                         flush:1;
> @@ -390,6 +391,9 @@ static __always_inline long bch2_dio_write_done(struct 
> dio_write *dio)
>
>         kfree(dio->iov);
>
> +       if (dio->have_mm_ref)
> +               mmdrop(dio->mm);
> +
>         ret = dio->op.error ?: ((long) dio->written << 9);
>         bio_put(&dio->op.wbio.bio);
>
> @@ -529,9 +533,24 @@ static __always_inline long bch2_dio_write_loop(struct 
> dio_write *dio)
>
>                 if (unlikely(dio->iter.count) &&
>                     !dio->sync &&
> -                   !dio->loop &&
> -                   bch2_dio_write_copy_iov(dio))
> -                       dio->sync = sync = true;
> +                   !dio->loop) {
> +                       /*
> +                        * Rest of write will be submitted asynchronously -
> +                        * unless copying the iov fails:
> +                        */
> +                       if (likely(!bch2_dio_write_copy_iov(dio))) {
> +                               /*
> +                                * aio guarantees that mm_struct outlives the
> +                                * request, but io_uring does not
> +                                */
> +                               if (dio->mm) {
> +                                       mmgrab(dio->mm);
> +                                       dio->have_mm_ref = true;
> +                               }
> +                       } else {
> +                               dio->sync = sync = true;
> +                       }
> +               }
>
>                 dio->loop = true;
>                 closure_call(&dio->op.cl, bch2_write, NULL, NULL);
> @@ -559,15 +578,25 @@ static __always_inline long bch2_dio_write_loop(struct 
> dio_write *dio)
>
>  static noinline __cold void bch2_dio_write_continue(struct dio_write *dio)
>  {
> -       struct mm_struct *mm = dio->mm;
> +       struct mm_struct *mm = dio->have_mm_ref ? dio->mm: NULL;
>
>         bio_reset(&dio->op.wbio.bio, NULL, REQ_OP_WRITE);
>
> -       if (mm)
> +       if (mm) {
> +               if (unlikely(!mmget_not_zero(mm))) {
> +                       /* process exited */
> +                       dio->op.error = -ESRCH;
> +                       bch2_dio_write_done(dio);
> +                       return;
> +               }
> +
>                 kthread_use_mm(mm);
> +       }
>         bch2_dio_write_loop(dio);
> -       if (mm)
> +       if (mm) {
> +               mmput(mm);
>                 kthread_unuse_mm(mm);
> +       }

I think the mmput() should go after kthread_unuse_mm(), though I don't
know if this ordering actually breaks anything.

Looks fine apart from that at a glance.


>  }
>
>  static void bch2_dio_write_loop_async(struct bch_write_op *op)

Reply via email to