On Wed, Nov 27, 2024 at 11:11 PM Kent Overstreet <[email protected]> wrote: > On Wed, Nov 27, 2024 at 10:48:03PM +0100, Jann Horn wrote: > > On Wed, Nov 27, 2024 at 10:36 PM Kent Overstreet > > <[email protected]> wrote: > > > When invoked from aio, mm_struct is guaranteed to outlive the request > > > since its lifetime is tied to the io_context - but that's not the case > > > for io_uring, it's possible that a process could be killed and mm_struct > > > goes away while a request is in flight. > > > > > > So if we're submitting the rest of the io asynchronously, we may need a > > > ref on mm_struct. > > > > > > Per Jens, this is not actually a bug because we're not yet flipping on > > > FMODE_NOWAIT, meaning io_uring will do the submission from an io_worker > > > kthread - but this patch is necessary for safely flipping on > > > FMODE_NOWAIT for more efficient submissions in the future. > > > > Ah, one thing to look out for is that mm_struct has two types of > > refcounting, this patch mixes them up - I don't think my explanation > > of that was very clear: > > > > - mmgrab()/mmdrop() prevent the mm_struct itself being freed, but > > don't properly keep the mappings in it alive > > - mmget()/mmput() fully keep the MM alive > > > > An mmget() reference automatically also holds an mmgrab() reference, > > basically. > > > > For kthread_use_mm(), you need an mmget() reference; so the most > > straightforward option would be to mmget() when creating the > > asynchronous work and mmput() when you're done. But that has the > > disadvantage that the reference keeps the entire mm_struct with all > > the mappings inside it alive, and it even indirectly holds a reference > > to all the files mapped in the MM; though I don't know if that's > > really a problem here. > > > > If you want to avoid keeping the MM of an exiting process alive while > > the I/O is running, the neater pattern is to use mmgrab()/mmdrop() for > > your long-term reference, and then use mmget_not_zero()/mmput() around > > kthread_use_mm()/kthread_unuse_mm() (and bail if mmget_not_zero() > > fails). You can see this pattern in places like > > vfio_iommu_type1_dma_rw_chunk(). > > gotcha... > > From 3ef3067690d838dfc9a12686caa015e85e217f01 Mon Sep 17 00:00:00 2001 > From: Kent Overstreet <[email protected]> > Date: Wed, 27 Nov 2024 16:32:26 -0500 > Subject: [PATCH] bcachefs: dio write: Take ref on mm_struct when using > asynchronously > > When invoked from aio, mm_struct is guaranteed to outlive the request > since its lifetime is tied to the io_context - but that's not the case > for io_uring, it's possible that a process could be killed and mm_struct > goes away while a request is in flight. > > So if we're submitting the rest of the io asynchronously, we may need a > ref on mm_struct. > > Per Jens, this is not actually a bug because we're not yet flipping on > FMODE_NOWAIT, meaning io_uring will do the submission from an io_worker > kthread - but this patch is necessary for safely flipping on > FMODE_NOWAIT for more efficient submissions in the future. > > Reported-by: Jann Horn <[email protected]> > Cc: Jens Axboe <[email protected]> > Signed-off-by: Kent Overstreet <[email protected]> > > diff --git a/fs/bcachefs/fs-io-direct.c b/fs/bcachefs/fs-io-direct.c > index 2089c36b5866..8da0577411e8 100644 > --- a/fs/bcachefs/fs-io-direct.c > +++ b/fs/bcachefs/fs-io-direct.c > @@ -226,6 +226,7 @@ struct dio_write { > struct mm_struct *mm; > const struct iovec *iov; > unsigned loop:1, > + have_mm_ref:1, > extending:1, > sync:1, > flush:1; > @@ -390,6 +391,9 @@ static __always_inline long bch2_dio_write_done(struct > dio_write *dio) > > kfree(dio->iov); > > + if (dio->have_mm_ref) > + mmdrop(dio->mm); > + > ret = dio->op.error ?: ((long) dio->written << 9); > bio_put(&dio->op.wbio.bio); > > @@ -529,9 +533,24 @@ static __always_inline long bch2_dio_write_loop(struct > dio_write *dio) > > if (unlikely(dio->iter.count) && > !dio->sync && > - !dio->loop && > - bch2_dio_write_copy_iov(dio)) > - dio->sync = sync = true; > + !dio->loop) { > + /* > + * Rest of write will be submitted asynchronously - > + * unless copying the iov fails: > + */ > + if (likely(!bch2_dio_write_copy_iov(dio))) { > + /* > + * aio guarantees that mm_struct outlives the > + * request, but io_uring does not > + */ > + if (dio->mm) { > + mmgrab(dio->mm); > + dio->have_mm_ref = true; > + } > + } else { > + dio->sync = sync = true; > + } > + } > > dio->loop = true; > closure_call(&dio->op.cl, bch2_write, NULL, NULL); > @@ -559,15 +578,25 @@ static __always_inline long bch2_dio_write_loop(struct > dio_write *dio) > > static noinline __cold void bch2_dio_write_continue(struct dio_write *dio) > { > - struct mm_struct *mm = dio->mm; > + struct mm_struct *mm = dio->have_mm_ref ? dio->mm: NULL; > > bio_reset(&dio->op.wbio.bio, NULL, REQ_OP_WRITE); > > - if (mm) > + if (mm) { > + if (unlikely(!mmget_not_zero(mm))) { > + /* process exited */ > + dio->op.error = -ESRCH; > + bch2_dio_write_done(dio); > + return; > + } > + > kthread_use_mm(mm); > + } > bch2_dio_write_loop(dio); > - if (mm) > + if (mm) { > + mmput(mm); > kthread_unuse_mm(mm); > + }
I think the mmput() should go after kthread_unuse_mm(), though I don't know if this ordering actually breaks anything. Looks fine apart from that at a glance. > } > > static void bch2_dio_write_loop_async(struct bch_write_op *op)
